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With the increase in agent-based applications, there are now agent systems that support concur- 
rent client accesses. The ability to process large volumes of simultaneous requests is critical in 
many such applications. In such a setting, the traditional approach of serving these requests one 
at a time via queues (e.g. FIFO queues, priority queues) is insufficient. Alternative models are 
essential to improve the performance of such heavily loaded agents. In this paper, we propose a set 
of cost-based algorithms to optimize and merge multiple requests submitted to an agent. In order 
to merge a set of requests, one first needs to identify commonalities among such requests. First, 
we provide an application independent framework within which an agent developer may specify 
relationships (called invariants) between requests. Second, we provide two algorithms (and var- 
ious accompanying heuristics) which allow an agent to automatically rewrite requests so as to 
avoid redundant work — these algorithms take invariants associated with the agent into account. 
Our algorithms are independent of any specific agent framework. For an implementation, we im- 
plemented both these algorithms on top of the IMPACT agent development platform, and on top 
of a (non-IMPACT) geographic database agent. Based on these implementations, we conducted 
experiments and show that our algorithms are considerably more efficient than methods that use 
the A* algorithm. 

Categories and Subject Descriptors: 1.2.12 [Artificial Intelligence]: Distributed AI — Intelli- 
gent Agents; 1.2.3 [Artificial Intelligence]: Deduction and Theorem Proving; D.2.12 [Software 
Engineering]: Interoperability; H.2.4 [Database Management]: Heterogenous Databases 

General Terms: Multi-Agency, Logical Foundations, Programming 

Additional Key Words and Phrases: Heterogenous Data Sources, Multi- Agent Reasoning 



1. MOTIVATION AND INTRODUCTION 

A heavily loaded agent is one that experiences a large volume of service requests 
and/or has a large number of conditions to track on behalf of various users. The 
traditional model for servicing requests is via one kind of queue or the other (e.g. 
FIFO, LIFO, priority queue, etc.). For instance, a company may deploy a PowerPoint 
agent ppt that automatically creates PowerPoint presentations for different users 
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based on criteria they have registered earlier. The finance director may get the latest 
budget data presented to him, a shop worker may get information on the latest work 
schedules for him, and the CEO may get information on stock upheavals. 

If the ppt agent has thousands of such presentations to create for different users, 
it may well choose to exploit "redundancies" among the various requests to enhance 
its own performance. Hence, rather than sequentially creating a presentation for 
the CEO, then one for the finance director, then one for the marketing manager, 
then one for the shop manager, etc., it may notice that the finance director and 
CEO both want some relevant financial data — this data can be accessed and a 
PowerPoint page created for it once, instead of twice. Likewise, a heterogeneous 
database agent hdb tracking inventory information for thousands of users may well 
wish to exploit the commonality between queries such as 

Find all suppliers who can provide 1000 automobile engines by June 
25, 2003 and Find all suppliers who can provide 1500 VX2 automobile 
engines by June 21, 2003. 

In this case, the latter query can be executed by using the answer returned by the 
first query, rather than by executing the second query from scratch. This may be 
particularly valuable when the hdb agent has to access multiple remote supplier 
databases — by leveraging the common aspects of such requests, the hdb agent can 
greatly reduce load on the network and the time taken to jointly process these two 
requests. 



The same problem occurs in yet another context. |Subrahmanian ct al. 200C] 
have described a framework called IMPACT within which software agents may be 
built on top of arbitrary data structures and software packages. In their framework, 
an agent manipulates a set of data structures (including a message box) via a set 
of well defined functions. The state of the agent at a given point in time consists 
of a set of objects in the agent's data structures. The agent also has a set of 
integrity constraints. When the agent state changes (this may happen if a message 
is received from another agent, a shared workspace is written by another agent or 
entity, a clock tick occurs, etc.), the agent must take some actions that cause the 
state to again be consistent with the integrity constraints. Hence, each agent has an 
associated set of actions (with the usual preconditions and effects), and an agent 
program which specifies under what conditions an agent is permitted to take an 
action, under what conditions it is obliged to take an action, under what conditions 
it is forbidden from taking an action, and under what conditions an action is in fact 
taken. [ Eiter et al. 2000|| have shown how (under some restrictions) such an agent 



program may be compiled into a set of conditions to be evaluated at run-time over 
the agent's state. When the agent state changes, then for each action a, one such 
condition needs to be evaluated over the state in order to determine which instances 
of that action (if any) need to be performed. Hence, numerous such conditions need 
to be simultaneously evaluated so that the agent can decide what actions to take 
so as to restore consistency of the state with the integrity constraints. 

Therefore, in this paper, we consider the following technical problem. Suppose 
an agent is built on top of heterogeneous data structures (e. g. using methods 
such as those described in various agent frameworks suc h as [Eiter et al. 199E ; 



Subrahmanian ct al. 200"o| ; pix et al. 2000[ |Dix ct al. 200l| ; pix et al. 2000| ]) 
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Suppose the agent is confronted with a set S of requests. How should the 
agent process these requests so as to reduce the overall load on itself? 

In the case of the ppt agent for example, this capabihty will allow the agent to 
recognize the fact that many presentations requested by different clients require 
common financial data to be computed and/or analyzed, and hence, performing 
this once instead of many times will most certainly enhance performance. Likewise, 
in the case of the Hdb agent, merging the two queries about automobile engines 
presented earlier allows the agent to reduce load on itself, thus allowing it to respond 
to other queries faster than by queuing. 

The paper is organized as follows: First, we provide the basic definitions and 
some preliminary results that will be employed throughout the paper in Section ^. 
Then, we present our architecture in Section ||. In Sections || and ^, we discuss 
the development phase and the deployment phase components, respectively. The 
experiments are discussed in Section ^. Finally, Section ^ presents related work and 
Section || concludes the paper. 

2. PRELIMINARIES 

All agents manipulate some set T of data types and manipulate these types via 
some set of functions (application program interface functions). The input/output 
types of functions arc known. If d is the name of a data structure (or even a 
software package), and / is an n-ary function defined in that package, then 

d:/(ai, ... ,a„) 

is a code call. This code call says 

Execute function f as defined in data structure/package d on the stated 
list of arguments. 

We assume this code call returns as output, a set of objects — if an atomic ob- 
ject is returned, it can be coerced into a set. For instance, if we consider a com- 



monly used data structure called a quad-tree |Samet 1989] for geographic reasoning, 
quadtree : range{{20, 30), T, 40)) may be a code call that says find all objects in the 
quadtree the root of which is pointed to by T which are within 40 units of location 
(20, 30) — this query returns a set of points. 
An atomic code call condition is an expression of the form 

in{t, d:/(ai, . . . , o^)) 

which succeeds if t is in the set of answers returned by the code call in question. 
For example, in(t, excel : chart{excelFile, rec, date)) is an atomic code call condition 
that succeeds if Ms a chart plotting rec with respect to date in the excelFile. 

We assume that for each type r manipulated by the agent, there is a set root{T) of 
"root" variable symbols ranging over t. In addition, suppose r is a complex record 
type having fields f i, . . . , f Then, for every variable X of type r, we require that 
X.f 1 be a variable of type r,; where t,; is the type of field f i. In the same vein, if 
f i itself has a sub-field g of type 7, then X.f i.g is a variable of type 7, and so on. 
The variables, X.f i, X.f i.g, etc. are called path variables. For any path variable Y of 
the form X.path, where X is a root variable, we refer to X as the root of Y, denoted 
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by root{Y); for technical convenience, rooi(X), where X is a root variable, refers to 
itself. If is a set of variables, then root{S) = {rooi(X) | X G S}. 

Convention 2.1. From now on, we use lower case letters (a, 6, c, Ci, . . . ) to denote 
constants and upper case letters (X, Y, Z, Xi, . . . ) to denote variables. When it is clear 
from context, we will also use lower case letters like s,t as metavariables ranging 
over constants, variables or terms. 

A code call condition (ccc) may now be defined as follows: 

(1) Every atomic code call condition is a code call condition. 

(2) If s and t are either variables or objects, then s = t is an (equality) code call 
condition. 

(3) If s and t are either integers/real valued objects, or are variables over the 
integers/reals, then s < t, s > t, s < t, and s > t are (inequality) code call 
conditions. 

(4) If xi and X2 are code call conditions, then xi &X2 is a code call condition. 

Code call conditions provide a simple, but powerful language syntax to access het- 
erogeneous data structures and legacy software code. 

Example 2.1. [Sample ccc] The code call condition 

in(FinanceRec, rel : select{financeRel, date, " = 11/15 /99 ")) & 
FinanceRec. sales > lOA' & 

in(C, excel: c/iart(ea;ce/Fite, FinanceRec, day)) & 
in(Slide, ppt : include{C, "presentation. ppt")) 

is a complex condition that accesses and merges data across a relational database, 
an Excel file, and a PowerPoint file. It first selects all financial records associated 
with "11/15/99": this is done with the variable FinajiceRec in the first line. It 
then filters out those records having sales more than lOK (second line). Using the 
remaining records, an Excel chart is created with day of sale on the x-axis and the 
resulting chart is included in the PowerPoint file "presentation. ppt" (fourth line). 

In the above example, it is very important that the first code call be evaluable. If, 
for example the constant financeRel were a variable, then 

rel: seteci(FinanceRel, date, " = ", "11/15/99") 

would not be evaluable, unless there were another condition instantiating this vari- 
able. In order to come up with a notion of evaluability, we need the following 
notion. 

Definition 2.2 (Dependent ccc's). For an atomic code call condition of the form 
in(Xi, cci) we define root(cci) = {root(Y) \ Y occurs incct} and root(X±) = {root{Y)) \ 
Y occurs in Xi}. For an (in-)equality code call condition ccCj„/g^ we define var{ccCin/eq) = 

{rOOtiY)) I Y occurs in CCCin/eq}- 

A code call condition Xj is said to be dependent on Xi iff the following holds: 

(1) Case 1: Xi is of the form in(Xi, cCi). 

(a) If Xj is o,n atomic code call condition of the form in(Xj, ccj) then 
root{Xi) C TOoi(cCj). 
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(b) If Xj *s equality or inequality code call condition of the form Si op S2, 
then either si is a variable and root(si) G root(Xi) or S2 is a variable and 
root{s2) G root(Xi) or both. 
(2) Case 2: Xi is an (in-) equality code call condition. 

(a) If Xj o,""^ atomic code call condition of the form in(Xj, ccj) then 
var{xi) ^ rooi(ccj). 

(b) If Xj is o.""^ equality or inequality code call condition of the form Si op S2, 
then either si is a variable and root{si) G var{xi) or S2 is a variable and 
root (S2) G var{xi) or both. 

Example 2.3. [Dependency among ccc's] The ccc xi '■ FinanceRec. sales > lOA' 
is dependent on the atomic code call condition 

X2 : in(FinaiiceRec, rel : select{financeRel, date, " = ", "11/15/99")), 

because roof(FinaiiceRec.sales) G rooi(FinanceRec). Similarly, the atomic code 
call condition X3 ■ in(C, excel: chart{excelFile, FinanceRec, day)) is dependent on 
the atomic code call condition X2, as the root variable FinanceRec which appears 
as an argument in the code call of xs is instantiated in X2 ■ 

Definition 2.4 (Code Call Evaluation Graph (cceg) of a ccc). A code call evalu- 
ation graph for a code call condition x = Xi^---^Xn, n > 1 where each Xi 
either an atomic, equality or inequality code call condition, is a directed graph 
cceg{x) = iy,E) where: 

(1) V =def {X. I 1 < * < n}, 

(2) E =def {{Xi^Xj) I Xj is dependent on Xi cind I < i j < n}. 

Example 2. 5. Figure |l| shows an example code call evaluation graph for the code 
call condition of Example 

If finRel were a variable FinRel, then the ccc would depend on the equality ccc 
FinRel — finRel. 

Using the dependency relation on the constituents of a code call condition, we 
are now able to give a precise description of an evaluable ccc. 

Definition 2.6 (Evaluability of a ccc, vari,ase(ccc)). A code call evaluation graph 
is evaluable iff 

(1) It is acyclic. 

(2) For all nodes Xij with in-degree the following holds: 

(a) If Xi is (in atomic code call condition of the form in(Xi, d:/(di, . . . jdn)), 
then each di,l < i < n, is ground. 

(b) If Xi is o,n equality or inequality code call condition of the form Si op S2, 
then either Si or S2 or both are constants. 

A code call condition ccc is evaluable iff it has an evaluable code call evaluation 
graph. 

For an evaluable ccc, we denote by varbase{ccc) the set of all variables ocurring 
in nodes having in-degree 0. The set var(ccc) of all variables ocurring in ccc may 
be a superset of vartaseiccc) . 
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in(FinanceRec, rel : select{finRel, date, " — ", "11/15/99")) 




in(Slide, ppt : inclucle{C, "prcsnt.ppt")) 



Fig. 1. The code call evaluation graph of Example 2.1 



Example 2. 7. The code call evaluation graph in Figure |i| is evaluable because the 
atomic code call condition of the only node with in-degree has ground arguments 
in its code call and it contains no cycles. 



In [ Eiter et al. 200d| the notion of a safe code call was defined to provide the 



necessary means to check if a given code call is evaluable. It defines a linear ordering 
of atomic, equality and inequality code calls within a given code call condition 
in such a way that when executed from left to right the code call condition is 
executable. Before tying our new notion of graph- evaluability to the notion of 
safety, we recapitulate the definition of safety from [Eiter et al. 2000 1. 



Definition 2.8 (Safe Code Call (Condition)). 
A code call d:f(argi, . . . ,argn) is safe iff each argi is ground. A code call con- 
dition xi&...&Xn, n > 1, "is safe iff there exists a permutation tt o/ xi, • ■ • , Xn 
such that for every i = 1, . . . , n the following holds: 

(1) If XiT(i) o,f^ equality /inequality Si op S2, then 

— at least one of Si,S2 is a constant or a variable X such that root{X) belongs 

to i?V^(i) = {root{Y) \3j < i s.t. Y occurs in X7r(j)}/ 
— if Si is neither a constant nor a variable X such that root(K) G i?V^^(i), then 

Si is a root variable. 

(2) If XiT(i) an atomic code call condition of the form in(Xjr(i), cc^j-j^)), then 
the root of each variable Y occurring in cc^(i) belongs to and eii/ier X7r(i) 
is a root variable, or root{X^(^i-j) is from i?V^^(i). 

We call the permutation n with the above properties a witness to the safety. 

Intuitively, a code call is safe, if we can reorder the atomic code call conditions 
occurring in it in a way such that we can evaluate these atoms left to right, assuming 
that root variables are incrementally bound to objects. 
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Example 2.9. Consider the code call condition 

in(FinanceRec, rel : s elect (financeRel, data, " = "11/15/99")) & 
in(C, excel : chart{excelFile^ FinanceRec, day)). 

This code call condition is safe as it meets both of the safety requirements. However, 
the following code call condition is not safe: 

in(FiiiaiiceRec, rel: select{financeRel, data, " = ", "11/15/99")) & 
in(C, excel : chart {ExcelF lie, FinanceRec, day)). 

This is because, there is no permutation of these two atomic code call conditions 
which allows safety requirement 1 to be met for the variable ExcelFile. 

As a cceg is acyclic for any cvaluable graph, ccegs determine a partial ordering 
^ on the Xi's: 

Xt ^ Xj if and only if {xt, Xj) e E. 



Hence, we may abuse notation and talk about topological sorts [ Knuth 1997 | of 
a graph to mean the topological sort of the associated partial ordering. Recall 
that given a partially ordered set (5, <), a topological sorting of that set yields a 
linearly ordered set {S, :<) such that (yx,y € S)x < y x ^ y. In the same vein, 
a topological sort of a directed acyclic graph (dag) is a linear ordering of nodes in 
the graph, such that if there exists an edge {vi, V2) in the graph, then vi precedes 
V2 in the topological sort. 

Theorem 2.10. tt is a witness to the safety of x */ and only if n is a valid topo- 
logical sort of the cceg of x- 

The algorithm Create-cceg (Figure ||) takes a code call condition x and creates 
an evaluable code call evaluation graph if x is evaluable — otherwise it returns NIL. 
The following example demonstrates the working of this algorithm for the code 



call condition of Example 2.1. 

Example 2.11. Let 

Xi ■ in(FinanceRec, rel : seZec<(f inancialRel, date, " = ", "11/15/99")), 
X2 '■ FinanceRec.sales > 10/v, 

X3 : in(C, excel: chart(excelFile, FinanceRec, day)), and 
Xi ■ in(Slide, ppt : include{C, "presentation. ppt")). 

First, L = {xij X2, X3j X4}, L' ~ Var = E = ^. We first create a node for each of 
the four code call conditions. Ok = {xi, X2}, as all arguments in the code call of xi 
are ground, and lOK is a constant in X2- Next, we create the edge (xij X2)- Because 
X2 depends on xi- Then. L ~ {X3iX4}! L' ~ {xi>X2} and Var ~ {FinanceRec}. 
In the first iteration of the while loop ^' = {xa} as X3 depends on xij and all 
variables in X3 (FinanceRec) are in Var. Var becomes {FinanceRec, C} and we 
create the edge (xii Xa)- Now, L = {xi}, L' = {xi, X2, Xs}- In the second iteration 
of the while loop ^' = {x4}i since X4 depends on xs and all variables in X4 (namely 
{C}) are in Var. This time, Var becomes {FinanceRec, C, Slide}, and we add 
the edge (X3:X4) to the graph. Now L becomes the empty set and the algorithm 
returns the code call evaluation graph given in Figure 0. 
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Create-cceg(x) 



/* Input: X : Xl&X2&:...&Xn */ 
/* OutputiNIL, if X is not evaluable */ 
/* a cceg CCEG = (V, E), if x is evaluable*/ 



L :={X1,X2, ...Xn}; 
L' :=0; 
Var :=0; 
E :=0; 

V ■.= {x^\ l<i<n}; 

Ok :={xi I Xi is either of the form in(X, d:/(args)) where args is ground or 

of the form Si op S2, where either Si or S2 or both are constants }; 
for all pairs (xiiXj); XiiXj S Ok such that Xj is dependent on Xi 
create an edge (XiiXj) S'nd add it to E\ 
Var ■.= Var U {root(Xi) | in(Xi, d;/(args)) g Ok}; 
L ■.= L-Ok; 
L' :=L' U Ok; 
while (L is not empty) do 

'J' := {xi I Xi G ^ 3'nd all variables in Xi s-^^ in Var and 
3xj G such that Xi depends on Xj}l 

if card ("I-) = then Return NIL; 

else 

Var := Var U {rooi(Xi) | in(Xi, d:/(args)) G I-}; 

for all pairs (xiiXj)i Xj S Vt, such that Xj is dependent on Xi S 

create an edge (xiiXj) 3'nd add it to E; 

L := L - *; 

L' := L' U 

Return (V, -B); 
End- Algorithm 



Fig. 2. Create-cceg Algorithm 

Convention 2.2. Throughout the rest of this paper, we assume that all code call 
conditions considered are evaluable and that the graph associated with each code 
call condition has been generated. 

The Create-cceg algorithm runs in 0{n^) time, where n is the number of con- 
stituents Xi of X- The number of iterations of the while loop is bounded by n, and 
the body of the while loop can be executed in quadratic time. 

We have conducted experiments to evaluate the execution time of the Create- 



cceg algorithm. Those experiments are described in detail in Section 3.1 



Definition 2.12 (State 0/ an agent). The state of an agent is a set of ground code 
call conditions. 

When an agent developer builds an agent, she specifies several parameters. One of 
these parameters must include some domain- specific information, explicitly laying 
out what inclusion and equality relations arc known to hold of code calls. Such 
information is specified via invariants. 



Definition 2.13 (Invariant Expression). 
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— Every evaluable code call condition is an invariant expression. We call such 
expressions atomic. 

— If iei and 162 are invariant expressions, then (/eiU/e2) and (/ein/e2) are invariant 
expressions. (We will often omit the parentheses.) 

Example 2.14- Two examples of invariant expressions are: 

in(StudentRec, rel: select{courseRel, exam, midterml)) & 
in(C, excel : chart{excelFile, StudentRec, grade)) 

in(X, spatial: horizontal (1,3,1!)) U (in(Y, spatial : /ionzonia;(T', B', U')) U 
in(Z, spatial: horizontal (T' ,B' ,1}))). 

What is the meaning, i.e. the denotation of such expressions? The first invariant 
represents the set of ah objects c such that 

in(StudentRec, rel : select{courseRel, exam, " = ", midterm!)) & 
in(c, excel : chart{excelFile, StudentRec, grade)) 

holds: we are looking for instantiations of C. Note that under this viewpoint, 
the intermediate variable StudentRec which is needed in order to instantiate C to 
an object c does not matter. There might just as well be situations where we are 
interested in pairs (c, studentrec) instead of just c. Therefore a notion of denotation 
must be flexible enough to allow this. 
Let us now consider the invariant 

in(StudentRec, rel : select{courseRel, exam, " = ", Typeof ExEun)) & 
in(C, excel : chart (excelFile, StudentRec, grade)) 

where the object midterml has been replaced by the variable Typeof Exam which 
is now a base variable. Then we might be interested in all c's that result if an 
instantiation of Typeof Exam is given, i.e. for different instantiations of Typeof Exam 
we get different c's. Thus we have to distinguish carefully between various sorts 



of variables: base variables (defined in Definition 2.17), auxiliary variables and the 



main variables defining the set of objects of interest. 

Definition 2.15 (Denotation of an Invariant Expression). Let ie be an invariant 
expression with var{ie) = varhase{ie) U {Vi, . . . ,V„}. The denotation of ie with 
respect to a state S , an assignment 6 of the variables in vari,ase{is) and a sequence 
(Vij, . . . , Vij,} (where V^^, . . . , V^j,} C {Vi, . . . , V„}J is defined as follows: 

—Let 

[ie]s^e { (o7r(i), . . . ,07r(„^)) I {ie9)T is ground and is true in state S, 

■K is a permutation on {1, . . . ,??,}, < n, 

T is a grounding substitution, 

T is of the form [Vi/oi, . . . , V„/o„] } 

—[iei n ie2]s,9 := [iei]s,e n [ie2]s,e, 
—[iei U ie2]s,e [iei]s,e U [ie2]s,0- 
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The variables in {V7r(i),... ,V^(„j,)} are called main variables while all remain- 
ing variables ^^{^nk+i)^ ■ ■ ■ ^^Tr(n)} are called auxiliary. The substitution r is de- 
fined on the set of main variables (in our example above it is the set {C}). The 
set of auxiliary variables consists of {StudentRec} and the only base variable is 
Typeof ExEun. Taking the first viewpoint in our example above, r would be defined 
on {C, StudentRec}. 

As usual, we abuse notation and say that iei C ie2 if [iei]s_e C [ie2]s,0 for all S 
and all assignments 0. Similarly, we say that iei = ie2 if [iei]s,6» = [ie2]s,e for all S 
and all assignments 9. Now we are ready to define an invariant. 

Definition 2.16 (Invariant Condition (ic)). invariant condition atom is a state- 
ment of the form ti Op ^2 where Op G >, =} o,nd each of ti, t2 is either a 

variable or a constant. An invariant condition (IC) is defined inductively as follows: 

(1) Every invariant condition atom is an ic. 

(2) If Ci and C2 are ic's, then Ci A C2 and Ci V C2 are ic's. 

Definition 2.17 (Invariant inv, vari,ase(inv), INVsimpie, ll^^ordinary, INV). in- 
variant, denoted by inv, is a statement of the form 

ic =^ iei 5ft '^2 (1) 

where 

(1) ic is an invariant condition, all variables occuring in ic are among warjase ('ei)U 
varbase{ie2) ■ 

(2) 5ft e {=, C}, and 

(3) iei, 162 o-fs invariant expressions. 

If iei cind ie2 both contain solely atomic code call conditions, then we say that inv 
is a simple invariant. 

If ic is a conjunction of invariant condition atoms, then we say that inv is an 
ordinary invariant. 

We denote by vari,ase(inv) the set of all vai^iables of inv that need to be instan- 
tiated in order to evaluate inv in the current state: vari,ase{inv) :— vari,ase{isi) U 
varbase{ie2) ■ 

The set of all invariants is denoted by INV. The set of all simple invariants is 
denoted by INVsimpie dnd the set of all ordinary invariants is denoted by INVordinary 

All invariant expresses semantic knowledge about a domain. Invariants used by 
each of our two example agents — ppt and hdb are given below. 

Example 2.18. The following are valid invariant conditions: vali < val2, Reli = 
Rel2. Note that such expressions can be evaluated over a given state S. Only the 
two relations < and > require that the constants occurring on the right or left hand 
sides must be of the appropriate type: these relations must be defined over each 
state 5*. 

The invariant 

File = File' A Rec = Rec' A Col = Col' 



in(C, excel: c/iart_one(File, Rec, Col)) — in(C', excel: c/iart_too(File', Rec', Col')) 
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says that these two code call conditions are equivalent if their arguments unify. 
Note that the code calls involved are different. The invariant, 

Rel = Rel' A Attr = Attr' A Op = Op' = "<" A Val < Val' 

in(X, rel : select{Rel, Attr, Dp, Val)) C in(Y, rel : select{Rel', Attr', Dp', Val')) 

says that the code call condition in(X, rel : se/ect(Rel, Attr, Op, Val)) can be evalu- 
ated by using the results of the code call condition 

in(Y, rel : select{Rel' , Attr', Dp', Val')) 

if the above conditions are satisfied. Note that this expresses semantic information 
that is not available on the syntactic level: the operator "<" is related to the 
relation symbol "<". 

Convention 2.3. Throughout the rest of this paper, we assume that we have the 
code calls ag : addition {X,Y) and ag : subtraction{X,Y) available for all agents ag. 
These code calls return the sum, (resp. the difference) of X and Y, where X and Y 
range over the reals or the integers. We also assume we have code calls ag : geg (X) 
(resp. ag : geq-0{X)) available which returns 1 if X is strictly greater (resp. greater 
or equal) than and otherwise. 

By stating invariants, we focus interest on states where the invariants hold. This 
is like in classical predicate logic, where we write down axioms and thereby constrain 
the set of models — we are only interested in the class of models satisfying the 
axioms. We therefore have to define formally what it means for a state S to satisfy 
an invariant in v. 

Definition 2.19 (Satisfaction, S \= inv, X |= /nv, Taut). 
A state S satisfies the invariant inv having the form shown in Formula (Qj above 
with respect to an assignment 8 iff for every ground instance {invd) r of invO, it is 
the case that either {ic9)T evaluates to false, or {iei 9) t ^ {ie2 0) t is true in S. 

We say that a set of invariants X entails an invariant inv iff all states S and 
assignments 9 satisfying X also satisfy inv. We write X \= inv. We call an invariant 
inv a tautology, if inv is true in all states S for all assignments 6: 

Taut=def {inv\ \= inv}. 

From now on we do not mention explicitly the assignment 9 and we write simply 
S h inv. 

It is worth noting that there are indeed trivial invariants that are satisfied in all 
states: such invariants are like tautologies in classical logic (therefore their name 
in the last definition). For example the following invariant is true in all states 
whatsoever (note the difference from the similar invariant above): 



File =File' A Rec =Rec' A Col =Col' =4> 
in(C, excel : chart{Y±le, Rec, Col)) = in(C', excel : c/iari(File', Rec', Col')) 
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The reason that this last invariant is a tautology, is that for the same set of instances 
of Y for a code call d : / (Y), we always get the same set representing the atomic code 
call condition in(X, d:/(Y)). 

Theorem 2.20. 

There is a translation Trons which associates with each conjunction ic of invariant 
condition atoms, and invariant expression ie another invariant expression 'Zvan5{ic, iei) 
such that the following holds for all states S, assignments 9 and invariants ic =^ iei 3? 

(3,9) h (/c=> iei K /e2) 
if and only if 
{S,9) \= true => Trans ( /c, /ei) 3? Srans(/c, /e2). 

Corollary 2.21 (Eliminating Invariant Conditions, TcansJ. Let inv: ic =^ iei 5? / 
be an arbitrary invariant. Then, the following holds for all states S and assignments 
6 

{S,9) h (/c=^ iei K /ea) 
if and only if 

(VC, l<i<m) (S*, 6*) h true =^ Trons(a, iei) ^ 1^ans{C^, /ea). 
where the Ci, 1 < i < m, are the disjuncts in the DNF of ic. 

3. ARCHITECTURE 

Let us suppose now that we have a set I of invariants, and a set S of data struc- 
tures that arc manipulated by the agent. How exactly should a set C of code call 
conditions be merged together? And what needs to be done to support this? Our 
architecture contains two parts: 

(i) a development time phase stating what the agent developer must specify when 
building her agent, and what algorithms are used to operate on that specifica- 
tion, and 

(ii) a deployment time phase which specifies how the above development-time spec- 
ifications are used when the agent is in fact running autonomously. 

We describe each of these pieces below. 
3.1 Development Time Phase 

When the agent developer builds her agent, the following things need to be done. 

(1) First, the agent developer specifies a set 2 of invariants. 

(2) Suppose C is a set of CCCs to be evaluated by the agent. Each code call 
condition x G C is represented via an evaluable cceg. Let INS{C) represent the 
set of all nodes in ccegs of xs in C That is, 

INS{C) = {vi I 3x G C s.t. Vi is in x's cceg}. 

This can be done by a topological sort of the cceg for each x G C. 

(3) Additional invariants can be derived from the initial set X of invariants. This 
requires the ability to check whether a set I of invariants implies an inclusion 
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relationship between two invariant expressions. We will provide a generic test 
called ChkJmp for implication checking between invariants. Although the set 



of invariants entailed by I is defined by Definition 2.19, the set of invariants 
actually derived by the ChkJmp test will depend on the set of axioms used 
in the test. Hence, some ChkJmp tests will be sound, but not complete. On 
the other hand, some tests will be "more complete" than others, because the 
set of invariants derived by them will be a superset of the set of invariants 
derived by others. "More complete" tests may use a larger set of axioms, hence 
will be more expensive to compute. The agent developer can select a test 
that is appropriate for her agent. Given an arbitrary (but fixed) ChkJmp 
test, we will provide an algorithm called Compute-Derived-Invariants that 
calculates the set of derivable invariants from the initial set X of invariants and 
needs to be executed just once. 

3.2 Deployment Time Phase 

Once the agent has been "developed" and deployed and is running, it will need to 
continuously determine how to merge a set C of code call conditions. This will be 
done as follows: 

(1) The system must identify three types of relationships between nodes in INS{C). 

Identical ccc's: First, we'd like to identify nodes XI1X2 G INS{C) which are 
"equivalent" to one another, i.e. xi = X2 is a logical consequence of the 
set of invariants 2. This requires a definition of equivalence of two code 
call conditions w.r.t. a set of invariants. This strategy is useful because we 
can replace the two nodes xii Xi by a single node. This avoids redundant 
computation of both x\ ^"^^ Xi- 

ImpUed ccc's: Second, we'd like to identify nodes Xi,X2 G INS{C) which are 
not equivalent in the above sense, but such that either xi X2 or X2 ^ Xi 
hold, but not both. Suppose xi ^ X2- Then we can compute X2 first, and 
then compute xi from the answer returned by computing X2- This way of 
computing XI7X2 may be faster than computing them separately. 

Overlapping ccc's: Third, we'd like to identify nodes xiiXi G INS{C) for 
which the preceding two conditions do not hold, but xi & X2 is consistent 
with INS{C). In this case, we might be able to compute the answer to 
Xi V X2- From the answer to this, we may compute the answer to xi 
and the answer to X2- This way of computing XI1X2 may be faster than 
computing them separately. 

We will provide an algorithm, namely Improved-CSI, which will use the set of 
derived invariants returned by the Compute-Derived-Invariants algorithm 
above, to detect commonalities (equivalent, implied and overlapping code call 
conditions) among members of C. 

Example 3.1. The two code call conditions in(X, spatial : -yerticaZ(T, L, R)) 
and in(Y, spatial: vertical (T', L', R')) are equivalent to one another if their ar- 
guments are unifiable. The results of evaluating the code call condition 



in(Z, spatial : range{T, 40, 50, 25)) 
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is a subset of the results of evaluating the code call condition 

in(W, spatial: ran5e(T', 40, 50, 50)) 

if T = T'. Note that spatial : range (T, X, Y, Z) returns all points in T that are Z 
units away from the point (X, Y). In this case, we can compute the results of the 
former code call condition by executing a selection on the results of the latter 
rather than executing the former from scratch. Finally, consider the following 
two code call conditions: 

in(X, spatial: horizontal {map, 100, 200)), 
in(Y, spatial: horizontal {map, 150, 250)). 

Here sytatial: horizontal {map, a, b) returns all points (X, Y) in map such that 
a < Y < 6. Obviously, the results of neither of these two code call conditions 
are subset of the results of the other. However, the results of these two code call 
conditions overlap with one another. In this case, we can execute the code call 
condition in(Z, spatial: horizontal {map, 100, 250)). Then, we can compute the 
results of the two code call conditions by executing selections on the results of 
this code call condition. 

(2) We will then provide two procedures to merge sets of code call conditions, 
BFMerge and DFMerge, that take as input, (i) the set C and (ii) the out- 
put of the Improved-CSI algorithm above, and (Hi) a cost model for agent 
code call condition evaluations. Both these algorithms are parameterized by 
heuristics and we propose three alternative heuristics. Then we evaluate our 
six implementations (3 heuristics times 2 algorithms) and also compare it with 
an A* based approach. 

4. DEVELOPMENT PHASE 

Prior to deployment of the agent, once the agent developer has defined a set of 
invariants, we compute a set of derived invariants from it. These derived invariants 
are stored. Once deployed, when the agent is confronted with a set of requests from 
other agents, it can examine these stored derived invariants for a "pattern match" 
which then enables it to classify invariants into one of the three categories listed 
(equivalent, implied or overlapping invariants). 

Consider the case when X contains the two invariants: 

Vi < V2 =^ in(X, d, :/j (Vi)) = in(Y, dz :/2(V2)). (2) 
V3 < V4 => in(Z, di :/2(V3)) C in(W, ds :/3(V4)). (3) 

Clearly from these two invariants, we can infer the invariant 

Vi < V2 A V2 < V4 =^ in(X, d, : /j (Vi)) C in(W, dj : /a (V4)) (4) 

Algorithm Combine_l (Figure ^) combines two invariants. The algorithm uses 
a simplify routine which simplifies a conjunction of invariant conditions and checks 
if the resulting invariant condition is inconsistent or not. If so, it returns NIL. The 
Combine_l algorithm makes use of two important algorithms: ChkJmp and 
Chk_Ent, which we will discuss in detail later. The ChkJmp algorithm checks 
if one invariant expression implies another, while the Chk_Ent algorithm checks 
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if some member of a set of invariants entails an other invariant. Let us first define 
the table which is implemented by the Combine_l algorithm. 

Definition 4-1 (Combinel). Let invi : ici =^ iei /e'j^ and inv2 : ic2 => ie2 3?2 
Then, the following table provides the resulting derived relation of the form 

simplify{ici A /C2) =^ iei 

when invi and inv2 are combined. The denotes a "don't care" condition in this 
table. The simplify routine checks whether ici A/C2 is inconsistent. If so, it returns 



3^1 


^2 


ChkJmp(ie'i, ie2) 


Chk_Imp(ie2, ie'i) 


derived _rel 


+ 




False 


* 


NIL 






True 


True 






c 


True 
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c 


c 




True 




c 


c 


c 


True 


* 


c 



Table 1. Summary of Combining Two Invariants 



false, if not it returns an equivalent (perhaps simplified) formula for ici A ic2 (the 
precise realization of simplify is not important here and leaves enough freedom for 
the actual implementation): 



{true, 
false, 



if for all {S,0) \= ic, 
if for all (3,9) ^ -;c, 
otherwise. 



Figure |3| implements a slightly generalized version of the last definition. Namely, 
we assume that there is given a set I of invariants and we are considering states 
satisfying these invariants. This is an additional parameter. For simplicity, assume 
that Ki = 5R2 = " C " . The idea is that although the subset relation ie'^ C ie2 might 
not hold in general (i.e. in all states) it could be implied by the invariants in 2 
(i.e. holds in all states satisfying I). That is, if there is ic* ==> ie^ C ieg G X 
s.t. (ic** \e[ C ie| e J, ic*** icj C ie2 G T, and (ici A 

1^2) (ic* A ic** A ic***)). Under these conditions, we can derive the invariant 
simplify(ici A ic2) => isi C ief,. 

We introduce three notions, ChkJmp, Chk_Taut and Chk_Ent of increasing 
complexity. The first notion, ChkJmp, is a relation between invariant expressions. 



Definition Jf..2 (Implication: Chk_Imp, iei ^ '62^- 
An invariant expression iei is said to imply another invariant expression 162, de- 
noted by iei — ^ is2, iff it is the case that [iei]s,e Q [is2]s,e for all S and all 
assignments 9. 



-"^where (/> is any formula equivalent to ic, i.e. for all states {S, 9): (S, 9) \= <p ^ Ic. 
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Combine_l (invi , inv2 , 1) 

I* invi : ici => iei 5Ri ie'j */ 

/* inv2 : ic2 ie2 R2 162 */ 



if (Chkjmp {ie'j^,ie2) = false) and 
(there is no ic* ie* C iej £ X s.t. 

(ic** ie'j C iej e I, ic*** iej C ie2 £ I, 

(iciAic2) (ic* A ic** A ic***)), then 
Return NIL; 
if (5Ri = 5R2 = " = ") then 

if (Chkjmp (ie2,ie'j^) = true) or 
(there is ic* ie^ C iej S X s.t. 

(ic** ie2 C iej G I, ic*** =^ iej C ie'^ G X, 

(iciAic2) (ic* A ic** A ic***))), then 
relation := (iei = iej); 
else relation := (iei C iej); 
else relation := (iei C iej); 
derivediC := simplify(ici Aic2); 
if {derivediC = false) then Return NIL; 
derivedSnv := {derivediC =^ relation); 

if (there is inv G I with (Chk_Ent (inv, derivedSnv) = true) then 

Return NIL; 
else Return derivedSnv; 
End- Algorithm 



Fig. 3. Combine_l Algorithm 



Chk_Imp is said to be an implication check algorithm if it takes two invariant 
expressions /ei,/e2 and returns a boolean output. We say that Chk_Imp is sound 
iff whenever Chk_Imp(ieiJe2)~true, then iei implies ie2. We say Chk_Imp is 
complete iff Chk_Imp{iei, ie2) = true if and only if iei implies ie2. 

If Chk_Impi, Chk_Imp2 are both sound, and for all iei, i&i, Chk_Impi{iei, ie2) — 
true implies that Chk_Imp2{iei,ie2) ~ true, then we say that Chk_Imp2 is more 
complete than Chk_Impi. 



Definition 4-3 (Chk_Taut, Chk_Ent as Relations between Invariants). Chk_Taut 
is said to be a tautology check algorithm if it takes a single invariant inv and re- 
turns a boolean output. Chk_Taut is sound iff whenever Chk_Taut(inv) =true, 
then inv € Taut (see Definition 2.1i). Chk_Taut is complete iff Chk_Taut{inv) 
= true if and only if inv S Taut. 

Chk_Ent is said to be an entailment check algorithm if it takes two invariants 
invi, inv2 and returns a boolean output. We say that Chk_Ent is sound iff whenever 
Chk_Ent(invi, inv2)~true, then invi entails inv2 (invi \= inv2). We say Chk_Ent 
is complete ifj Chk_Ent{invi, inv2) = true if and only if invi entails inv2. 

Similarly to Definition ^.1, we use the notion of being more complete for tau- 
tology as well as for entailment check algorithms. 



Lemma 4-4 (Relation between Chk_Imp, Chk_Taut and Chk_Ent). 
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(1) Chk_Imp can he reduced to Chk_Taut: 

Chk_Imp{iei, 162) if and only if Chk_Taut{true => iei C '162)- 

(2) Chk_Taut can be reduced to Chk_Imp: 

Chk_Taut{{Ci V C2 V . . . V C™) /ei C /ea) 
if and only if 

WCi, 1 <i < in, Chk_Imp{'Zxan5{Ci, /ei), 1rans(Ci, 162)). 

(3) Chk_Taut is an instance of Chk_Ent. 

Thus in general, implication checking between invariant expressions is a special case 
of tautology checking of invariants. Conversely, checking tautologies is an instance 
of implication checking. Note that checking simple invariants is reduced to checking 
implications of non-simple invariant expressions. 

It is also obvious that checking for tautologies is a special case of the entailment 
problem. 

The following results tell us that the implementation of the ChkJmp routine 
used in the Combine_l algorithm is undecidable in general. Even if we restrict to 
finite domains, it is still intractable. 

Proposition 4-5 ( Undecidability of Chk_Imp, Chk_Taut, Chk_Ent). 

Suppose we consider arbitrary datatypes. Then the problem of checking whether 
an arbitrary invariant expression iei implies another invariant expression 162 is 
undecidable. The same holds for checking tautologies of invariants or entailment 
between invariants. 

Proposition 4-6 (co-NP Completeness of Checking Implication) . 
Suppose all datatypes have a finite domain (i.e. each datatype has only finitely 
many values of that datatype). Then the problem of checking whether an arbitrary 
invariant expression iei implies another invariant expression ie2 is co-NP complete. 
The same holds for the problem of checking whether an invariant is a tautology. 

As the problem of checking implication (and hence equivalence) between invariant 
expressions is co-NP complete, in this paper, we decided to study the tradeoffs 
involved in using sound, but perhaps incomplete implementations of implication 
checking. 

There are clearly many ways of implementing the algorithm ChkJmp that are 
sound, but not complete. In this paper, we propose a generic algorithm to imple- 
ment ChkJmp, where the complexity can be controlled by two input parameters — 
an axiomatic inference system and a threshold. 

— The axiomatic inference system used by ChkJmp includes some axioms and 
inference rules. By selecting the axioms and inference rules, the agent developer 
is controlling the branching factor of the search space. 

— The second parameter called the threshold is either an integer or 00, and de- 
termines the maximum depth of the search tree. If it is 00, then the generic 
algorithm does not have an upper bound on the number of rule applications, 
and terminates either when it proves the implication or there is no further rule 
that is applicable (i.e. failure). When it is an integer value, the algorithm reports 
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failure if it cannot prove the implication by using the threshold number of rule 
applications. 

We have conducted experiments with different instances of these two parameters. 
Those experiments are discussed in detail in Section |6.2| . 

It is important to note that the set of all derived invariants obtained from I may 
be very large because they contain "redundant" constraints. For instance, using 
our example I above, every invariant of the form 

Vi < V2 A V2 < V4 

in(X,di :/j(Vi)) C in(W, d3:/s(V4)) U in(T, d4 (V5)) . . . 

would be entailed from I — however, these invariants are redundant as they are 
entailed by the single invariant (^. 



As we have seen above (Propositions 4.5, |4.6|), such an entailment test between 



invariants is either undecidable or intractable. It would be much better if we had 
a purely syntactical test (which must be necessarily incomplete) of checking such 
implications. 

The following lemma shows that entailment between two invariants can, under 
certain assumptions, be reduced to a syntactical test. 

Lemma ^.7. Let invi : ici => iei C ie'i and inv2 : /C2 => 162 C /e'j be 
two simple invariants, i.e. iei has the form in(X, di :/;(...)), /e'j^ has the form 
in(X, d'^ : // (.-.)), '62 has the form in(Y, di : /a (. . . )) and '1^2 has the form in(Y, d'2 ■ f2 (• 

If invi ^ inv2 and inv2 is not a tautology inv2), then the following holds: 

(1) di=d2 andfi = f2, 

(2) d'l-d^ andf[ ^fi 

(3) In all states that do not satisfy inv2, it holds ''ic2 ici". I.e. each coun- 
terexample for inv2 is also a counterexample for invi. 



Corollary 4-8 (Sufficient Condition for Chk_Ent). 
There is a sufficient condition for Chk_Ent(invi, inv2) based on Chk_Imp and 
Chk_Taut: First check whether inv2 £ Taut. If yes, Chk_Ent{invi, inv2) holds. If 
not, check whether ic2 — > ici holds in all states (i.e. Chk_Imp{ic2, ici)). If yes, 
Chk_Ent{invi, inv2) holds. 

In this paper, we use the following sound but incomplete Chk_Ent algorithm. 
Let invi : ici =^> iei ie'i and inv2 : ic2 =^ ie2 ^2 >^2- Then, Chk_Ent(invi, inv2) = 

true iff 

(1) For all states S: S ^ ic2 ici, 

(2) (5fti = "C" and K2 = "C") or (5Ri = " = " and K2 = "C"), 

(3) ie2 iei, 

(4) iei - ie^- 
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4.1 Computing All Derived Invariants 

In this section, we define how given a set X, the set of all invariants that are entailed 
by X may be computed using the selected ChkJEmp and Chk_Ent algorithms. 

The Compute-Derived-Invariants algorithm presented in Figure |^ takes as 
input a set of invariants X, and returns a set of invariants Z*, such that every 
invariant in T* is entailed by T. Although the Compute-Derived-Invariants 
algorithm has exponential running time, it is executed only once at registration- 
time, and hence the worst case complexity of the algorithm is acceptable. 



Compute-Derived-Invariants(X) 




X := J; 




change := true; 




Dane := 0; 




while change do 




change := false 




forall inv,; E X do 




forall inv j S X — {Inv;} s.t. (inv,;, inv,) 


^ Done do 


derivedSnvi := combine_l(invi, inVj 


,xy, 


if derivedjny^ != NIL then 




X := X VJ {derivedSnvi}; change 


:= true; 


derivedSnv2 := combine_l(inVj , InVj 


,X); 


if derivedSnvy. != NIL then 




X := X U {derivedSnv2}; change 


:= true; 


derivedj,nv3 := combine_2(inVj , inVj 


); 


if derivedSnv^ != NIL then 




X := X U {derivedSnv^}; change 


:= true; 


derivedSnVA := combine_3(inVj , inVj 


); 


if derivedjinvi != NIL then 




X := X U {derivedSnv4}; change 


;= true; 


Done := Done U {(inv^, inVj), (inVj , inv;)}; 


Return X. 




End- Algorithm 





Fig. 4. Compute-Derived-Invariants Algorithm 



Lemma 4-9- For alll: /ni/2} U X ^ Combine_l{invi, inv2,T). 

Combine_l docs not derive all invariants that are logically entailed by T. For 
example from "true =^ iei C ie2" and "true ^ ie2 C iei" wc can infer "true ^ iei = 
ie2". We call this procedure, slightly generalized, Combine_2. It is illustrated in 
Figure ^ The unify routine takes two invariant expressions and returns the most 
general unifier if the two are unifiable, and returns NIL if they are not unifiable. 

Another set we need is the set of all invariant tautologies 

Taut ~def {true =^ iei ^ ie2 : Chk_Imp(iei, ie2)}. 

Obviously, all tautologies are satisfied in all states and the invariant computed in 
the Combine_2 Algorithm (if it exists) is entailed by the invariants it is computed 
from. 

Lemma 4-10. {invi,inv2] |= Combine_2{invi, inv2). 
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Combine_2 (invi , inv2 ) 

/* invi : ici => iei 5Ri '\e[ */ 
/* inv2 : ic2 ie2 R2 162 */ 

if (sRi = SR2 = "C") then 
:= unify(\e2, \e[); 
7 := unify{ie'2, iei); 
if (0 != NIL) and (7 != NIL) then 
derivediC := simplify((ici A ic2)97); 
if (derivediC = false) then Return NIL; 
derivedSnv := [derivediC =^ (iei)07 = {\e'-^)9'^)\ 
Return derivedSnv; 
else Return NIL. 
else Return NIL. 
End- Algorithm 



Fig. 5. Combine_2 Algorithm 



However, the above sets are still not sufficient. Consider the situation invi : x < 
=^ iei SRi ie'j^, and inv2 : x > => iei ^1 ie'i- Then, we can conclude 
true iei ie'i- However, neither Combine_l nor Combine_2 is able to 

compute this invariant. As a result, we define the final routine, Combine_3, given 
in Figure 0, to capture these cases. 



Combine_3(invi , inv2) 

/* invi : ici iei \e[ */ 

I* inv2 : ic2 ie2 3?2 162 */ 



if (3?i = SR2) then 
d := unify(iei , ie2); 
7 := unify(ie'j^, iej); 
if (e != NIL) and (7 != NIL) then 
derivediC := simplify((ici V 102)^7); 
derivedSnv := derivediC =^ ({iei)97 5Ri (ie'j^)07); 
Return derivedSnv; 
Return NIL 
End- Algorithm 



Fig. 6. Combine_3 Algorithm 



We emphasize in the Combine_3 Algorithm our use of the simplify routine 



introduced just after Definition 4.1, Our example is captured because a; < V a; > 
is simplified to true. By recursively applying Combine_3, one can also handle more 
complicated intervals like a;<0 V (a;>OAa;<l) V x >1. 

Lemma 4-1 1- {'"I'l, '"1/2} H Combine_3{invi, inv2). 



Definition 4- (Operator Cj). We associate with any set X of invariants, a 
mapping Cj : INV —^ INV which maps sets of invariants to sets of invariants, 
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as follows: 

Cx{X) =def {Combine_l{invi, inv2, X U T) \ invi, inv2 ^ X U T} U 

{Combine_2{invi, inv2) \ invi, inv2 ^ X U T} U 

{Combine_3{invi, inv2) \ invi, inv2 ^ X U T} U 

lux 

Definition 4-13 (Powers of Cx). The powers of Cx{X) are defined as follows: 



Cxf 

Cx T^'+i) 

Cx r 



~ Taut 

- CxiCx f) 

-U>o(^xr) 



Proposition 4-14 (Monotonicity of Cx)- If Xi C X2, then Cx{Xi) C Cx{X2). 
Lemma 4.15. Cx{Cx D C Cx T". 
Lemma 4-16. Cx I 2^ H inv} . 

What we are really interested in is a converse of the last lemma, namely that all 
invariants that follow from I can be derived. Strictly speaking, this is not the case: 
we already noticed that there are many redundant invariants that follow from T but 
are subsumed by others. Such "redundant" invariants contribute little. We show 
below that whenever an invariant is entailed from Z as a whole, it is already entailed 



by another variant in Cx f^- This is the statement of our main Corollary 4.18 



Theorem 4-17 (All Entailed "<Z" -Invariants are Subsumed in Cx f^j- 
Suppose X \= inv. We assume further that all the invariants are simple and that 
5ft = "C" in all invariants. Then, there is in\/ G Cx such that in\/ entails inv. 

Corollary 4-18 (All Entailed Invariants are Subsumed in Cx T"J- f^fc- ^ow con- 
sidering arbitrary simple invariants, i.e. 5ft = {C,=}. IfX \= inv, then there exists 
in\/ S Cx T"^ such that \n\J entails inv. 

The following corollary tells us that if the implementation of ChkJmp and 
Chk_Ent algorithms used are complete, then the Compute-Derived-Invariants 
algorithm correctly computes all derived invariants. 

Corollary 4-19 (Development-Time Check). Suppose Chk_Imp is a complete im- 
plication check, and Chk_Ent is a complete subsumption check algorithm. Then, 
the set of invariants returned by the Compute- Derived- Invariants has the fol- 
lowing properties: 

(1) Every invariant returned by it is implied by X and 

(2) If an invariant is implied by X, then there is an invariant inv' returned by 
the Compute- Derived- Invariants algorithm that entails inv. 

Our results above apply to simple invariants only. The reason is that in Table 
only a subset of all possible derivable invariants are listed. For example even if the 
ChkJmp tests do not hold, then there are still the following nontrivial invariants 
entailed: 
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(1) derived-invi : ici A ic2 

(2) derived-inv2 : ici A ic2 



(iei n ie2) 5? {\e[ n icj) 
(iei U ie2) 3? (ie'i U iea) 



In fact, our framework can be easily extended as follows. Let ivi : ici 
iei 5^1 ie'i and iw2 • ic2 ie2 3?2 iejj. In addition to the derived invariant returned 
by the Combine_l algorithm, the new extended XCombine_l also returns the 
derived invariants determined by Tables and 4.1. 
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Table 2. XCombine for Arbitrary Invariants 



5. DEPLOYMENT PHASE 

Once an agent is up and running, it is continuously confronted with requests for its 
services. One crucial observation is that there might be enormous overlap among 
these requests. These overlaps can be exploited if a given set C of code call con- 
ditions are merged in a way that executes common portions only once. However, 
in order to exploit commonalities, we must first determine the type of those com- 
monalities, that is we must first identify code call conditions (1) that are equivalent 
to other code call conditions, (2) that are implied by other code call conditions 
and (3) that overlap with other code call conditions. Moreover, given two code 
call conditions ccci and ccc2, it might be the case that they are neither equivalent, 
nor implied, nor overlapped. On the other hand, parts of ccci and ccc2 maybe 
equivalent, implied or overlapped. We also want to exploit such cases. This gives 
rise to the following definition: 

Definition 5.1 (Sub-Code Call Condition) . Let ccc = X1&X2& . . . &Xn be a code 
call condition, ccc^ :— Xii^Xi2^ ■ ■ ■ ^Xij ; for 1 < ii, . . . ij < n and ii ^ ik '^^ 
< j is called a sub-code call condition of ccc. 
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Table 3. XCombine for Arbitrary Invariants 

Example 5.2. Let ccci = X1&X2&X3&X4&X5- Then, X1&X2&X3, Xi&Xs&Xs and 
X2&X5 are some sub-code call conditions of ccci. 

Note that a code call condition with k atomic/(in)equality code call conditions 
has 2*^ different sub-code call conditions. We are now ready to define equivalent, 
implied and overlapping sub-code call conditions Xi X ■ To do so, we need to fix 



the variable(s) in each ccc to which we want to project (see Definition 2.15: the 
sequence (u^ ^ , . . . , Wi^. ) of variables occurring in xi , and the sequence (uj^ , . . . ,v^^) 
of variables occurring in X2 are important). Often the sequences consist just of one 
single variable: this is the case when there is only one non-base variable occurring 
in the ccc's. In that case we do not explicitly mention the sequences. 

Definition 5.3 (Equivalent (Sub-) CCC). Two (sub-) code call conditions xi o-nd 
X2 are said to be equivalent w.r.t. the sequences (wi^,... and (v^^ , . . . ,v^^), 

denoted by Xi = X27 */ and only if for all states S of the agent and all assignments 
6, it is the case that 

[Xi]s,e,{vi-^,...,vij = [X2]s,e.{v' ,...,v' )■ 

In the case of equivalent ccc's, we only need to execute one of the sub-code call 
conditions. We can use the cached solutions for the other sub-code call condition. 

Example 5.4- The ccc in[C, excel: chart{excelFile, FlnanceRec, day)) is equiva- 
lent wrt. the sequences (C), (C ) to the ccc in(C', excel: chart{excelFile, Rec, day)), 
since the two code call conditions unify with the mgU 7 ~ [FinanceRec/Rec]. 



Definition 5.5 (Implied (Sub-) CCC). A (sub-) code call condition xi is said to 
imply another (sub-) code call condition xi wrt. the sequences (ui^,... and 
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(f , . . . ,Vj^^), denoted by xi ^ X2, "if o.iT-d only if for all states S of the agent and 
all assignments 9, it is the case that 

[Xl]s,e,{vi^,...,ViJ ^ [X2]s^9Jv' ,...,v'. >' 

and it is not the case that xi = X2- 

In the case of implied ccc's, we execute and cache the sohitions of X2- In order 
to evaluate xi, all we need to do is to use the cached results to restrict the solution 
set of X2- 

Example 5.6. The code call condition in(Ti, spatial : range{mapl, 5, 5, 30)) im- 
plies the code call condition in(T2, spatial: range{mapl, 5, 5, 50)), because all the 
points that are within 30 units of the point (5, 5) are also within 50 units of (5, 5). 
As mentioned above, in this case, we suppress the sequences (Ti) and {T2). 

Definition 5.7 (Overlapping (Sub-) CCC). Two (sub-) code call conditions xi 
and X2 OuT^ said to be overlapping wrt. the sequences [vi^ , . . . , f ) and {v^_^, . . . , ) , 
denoted by xi -L X2, if o,nd only if for some states S of the agent and for some as- 
signments 9, it is the case that 

[Xi]s,e,^,i,...,u,j n [X2]s,e,(u' ) ^ 

and neither xi ~* X2 nor X2 "* Xi- 

In the case of overlapping ccc's, we execute and cache the solutions of X3, where 
X3 is a code call condition the solution of which is set equal to the union of the 
solution sets of Xi and X2- In order to evaluate both xi and X2i we need to access 
the cache and restrict the solution set of xs to those of xi's and X2's solution sets. 
Note that the definition of overlapping ccc's requires that the intersection of the 
solution sets of xi and X2 be non-empty for some state of the agent. This implies 
there might be states of the agent, where the intersection is empty. However, the 
solution set of X3 in such a case still contains the solutions to xi and X2. 

Example 5.8. The code call condition in(Ti, rel : rngselect{emp, age, 25, 35)) over- 
laps with the code call condition in(T2, rel : rngselect{emp, age, 30, 40)), because all 
employees between the ages 30 and 35 satisfy both code call conditions. 

In order to identify various relationships between code call conditions, we use 
the derived invariants that are computed at development phase. We are now faced 
with the following problem: 

Definition 5.9 (Common sub-ccc identification problem). Given a set of code call 
conditions C={ccci, ccc2, ... , C = ccc„}, and a set of derived invariants, X* , find 
all sub-code call conditions of /\^^^ccci that are 

— equivalent with respect to T* , 

— imply one another with respect to T* , 

— overlap with each other with respect to I* . 

The brute-force solution to the above problem is to choose two code call con- 
ditions, ccci and cccj from C, then traverse the list of invariants, I* , and apply 
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Brute-Force-CSI(C,J*) 

/* Input: C = {ccci, ccc2, . . . , cccn} */ 

/* J* = {invi, inv2, . . . , inv„i} */ 

/* Output: Eq = {(x,,X^) I X^ = Xj} */ 

/* I={(Xr,Xj)\X^^Xj} V 

/* O = {(Xi,Xj,Xk) I Xi -L Xj and Xi ^ Xfc o.nd Xj Xk} 1 1* I 

sc := Ar=i c» 

SCP := all sub-code call conditions of SC 
for all Xi G SCP do 

for all Xj 7^ Xi e SCP do 
for all inv € X* do 
Applylnvariant{inv, Xi , Xj > Eq, I, O) 
Applylnvariant{inv, Xj , Xii O) 
Return (Eq,I,0) 
End- Algorithm 



Fig. 7. Brute-Force CSI Algorithm 

each invariant to various sub-code call conditions of ccc^ and cccj . The algorithm 
Brute-Force-CSI, given in Figure ^, implements this approach. 

The Brute-Force-CSI algorithm makes use of an Applylnvariant routine 
which takes as input an invariant and two sub-code call conditions, as well as 
the equivalent, implied and overlapped sub-code call conditions sets. It applies the 
invariant to the sub-code call conditions, and inserts the relationship entailed by 
the invariant into the respective set. This routine is given in Figure Note that 
we need to call Applylnvariant twice with different relative orders for Xi and Xj ■ 



Applylnvariant (inv, Xi, Xj i O) 

if {iv is of the form Ic ie\ = 162) and 

{36, such that Xi = {iei)9 and Xj = {ie2)9 and {\c)6 = true) then 

Eq = EqU{{xi,Xj)} // X^ = Xj 
else if (inv is of the form ic iei C 462) and 

{39, such that Xi = {i£i)9 and Xj = {ie2)6 and {\c)9 = true) then 

^ = -f U{(xi,Xj)} // X^^XJ 
else if (inv is of the form ic {iei U 162) = iea) and 

{39, such that Xi = and Xj = {ie2)9 and (ic)6 = true) then 

Xk = {ie3)9 

= U{{x^,Xo,Xk)} II Xi -LXj 

Return 

End- Algorithm 



Fig. 8. Applylnvariant Routine 

Assuming Applylnvariant takes constant time to execute, the complexity of the 
Brute-Force-CSI algorithm is 0(m * 2^*^), where m is the number of invariants in 
X* and k is the number of atomic/ (in)equality code call conditions in SC. However, 
one important observation is that we do not have to apply each invariant to all 
possible sub-code call conditions. An invariant expression can only unify with a 
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(sub-) code call condition if both contain "similar" (sub-) code call conditions. The 
performance of the Brute-Force-CSI algorithm can be significantly improved by 
making use of this observation. But, before describing this improved CSI algorithm, 
let us first define similar (sub-) code call conditions. 

Definition 5.10 (Similar (sub-) code call conditions). Two (sub-) code all condi- 
tions xi cind X2 are sdid to be similar if one of the following holds: 

— Both xi o,nd X2 are atomic code call conditions of the form in(-, d :/(•)). 
— Both xi o,nd xi or^ equality /inequality code call conditions. 

— Xi is of the form X11&X12 o-nd X2 "is of the form X21&X227 and Xii is similar to 
X21 and X12 is similar to X22- 



Improved-CSI(C,J*) 

/* Input: C = {ccci, ccc2, . . . , cccn} 

I* X* = {invi,inv2,... ,invm} */ 

/* Output: Eq = {(xi,Xj) \Xi=Xi,} */ 

r I = {{xi,xj)\xi^xj} */ 

I* O = {(xi.Xj.Xfc) I Xi -L XjO'rid Xi -» Xfe a,rid Xj -» Xfe} */ 



(1) {Gi,G2,... := Classify(C); 

(2) for all Gi for » = 1. . . . < do 

(3) X = {inv I inv contains similar sub — code call conditions with d} 

(4) for all Xj G G, do 

(5) for all Xk Xj G Gj do 

(6) for all inv G 1 do 

(7) Applylnvariant(inv, X;j , Xfei O) 

(8) Applylnvariant(inv, Xfc, Xj i O) 

(9) Return (Eq,I.O) 

(10) End-Algorithm 



Fig. 9. Improved CSI Algorithm 



The Improved-CSI algorithm is given in Figure ^. Lines (1) and (3) of the 
algorithm need further explanation. In order to facilitate fast unification of sub- 
code call conditions with invariant expressions, the Classify(C) routine in the 
Improved-CSI algorithm organizes sub-code call conditions int o gro ups such that 
each group contains similar sub-code call conditions. Example |5.1l| demonstrates 
how Classify(C) works. 
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Fig. 10. Organization of Sub-code Call Conditions of Example 5.11 



Example 5.11. Consider the following code call conditions: 

Xii = in(FinanceRec, rel : seteci(/inancei?e/, safes, ">", lOK)) 

X12 = FinanceRec.date > "6/6/2000" 

X13 = in(C, excel : chart{excelFile, FinanceRec, day)) 

X21 = in{F±naoa.ceRec, rel : select{financeRel, sales, ">" ,20K)) 

X22 = FinanceRec.date = "7/7/2000" 

X23 — in(C, excel : chart{excelFile, FinanceRec, day)) 

X31 ~ iniFlnanceRec, rel : select{financeRel, sales, ">" ,30K)) 

X32 = FinanceRec.date = "7/7/2000" 

X33 = in(C, excel : chart{excelFile, FinanceRec, month)). 

Let ccci = X11&X12&X13, CCC2 = X21&X22&X23 and CCC3 = X31&X32&X33- Figure 
[lO| shows how sub-code call conditions of ccci, ccc2 and ccc^ are grouped. 

Line (3) of the algorithm identifies a subset Z C X* of invariants that are applica- 
ble to a given group of sub-code call conditions. In order to speed up this task, the 
invariants are stored in a hash table based on the in(-, d : /(■))'s they contain. Given 
a group of sub-code call conditions, we apply only those invariants which contain 
similar sub-code call conditions (lines (7) and (8)). The Improved-CSI algorithm 
also uses Applylnvariant to compute various relationships. However, the number 
of times it is invoked is much smaller than the number of times it is invoked in 



Brute-Force-CSI algorithm. Example |5.12| demonstrates how Improved-CSI 
algorithm works. 
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Example 5.12. The algorithm first processes group-0 (Figure p^. It identifies all 
invariants containing 'ui{-^re\: select{-)) code calls. Then, it tries to apply each of 
those invariants to various combinations of this group. For example, the following 
invariant will unify with pairs of code call conditions in this group: 

Rel = Rel' A Attr = Attr' A Op = Op' = ">" A Val > Val' 

in(X, rel : se/eci(Rel, Attr, Op, V)) C in(Y, rel : s elect {Kel' , Attr', Op', V')). 
As a result of the application of this invariant the following relationships are found: 



in(FiiiaiiceRec, rel 


s elect (financeRel, sales, 


"> 


"20K)) 


in(FinaiiceRec, rel 


s elect (financcRel, sales, 


"> 


",10K)) 
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"> 


"20K)) 



The same procedure is applied to group- 1 resulting in the discovery of the fol- 
lowing relationships: 

FinanceRec.date = "7/7/2000" FinanceRec.date > "6/6/2000" 
FinanceRec.date = "7/7/2000" ee FinanceRec.date = "7/7/2000" 

As a result of processing group-2, the following relationship is found: 

in(C, excel: c/iart(ea:ceZFi/e, FinanceRec, day)) 

in(C, excel: c/iart(ea;ce/Fi/e, FinanceRec, day)). 

We process the other groups similarly. When processing group-5, we only apply 
invariants containing both in(-, rel: select{-)) and in(-, excel: chart(-)) code calls. 
Finally, the following relationships are found: 



X21 Xll 
X22 = X32 
Xl3 = X23 

X21&X22 ^ X11&X12 
X22&X23 X12&X13 
X21&X23 X11&X13 
X2I&X22&X23 XII&X12&X13 



X31 
X22 



Xll 

Xl2 



X31&X32 Xll&Xl 



X31 

X32 



X2I 
Xl2 



X31&X32 X2I&X22 



It is important to note that in the above algorithms the derived invariants computed 
during the development phase are used to match the sub-code call conditions. This 
assumes that the derived invariants are complete, that is they contain all possible 
relationships derivable from X. However, this may be too costly to compute. More- 
over, we may end up storing a lot of invariants which never match with any of the 
sub-code call conditions. One solution to this problem is to restrict the length of 
invariant expressions in the derived invariants. However, in that case we need to 
perform some inferencing at deployment to make sure that we compute all sub-code 
call condition relationships. 
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Hence, in the case of incomplete derived invariants, we also need to perform a 
second phase where we use the inference rules in Table ^ to deduce further rela- 
tionships. 
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Table 4. Inference Rules Used in Improved-CSI Algorithm 



Once the agent identifies equivalent, implied and overlapping sub-code call con- 
ditions in C, it merges those sub-code call conditions to decrease execution costs. 
In the next section we will describe how to merge a set of sub-code call conditions. 

5.1 Merging Code Call Conditions 

In this section, wc first describe a technique for evaluating costs of code call con- 
ditions. We then describe two algorithms — the DFMerge and the BFMerge 
algorithms — which are used to process the set C = {ccci, ccc2, ccc„} of code call 
conditions. Both of these algorithms are parameterized by a selection strategy. 
Later, in our experiments, we will try multiple alternative selection strategies in 
order to determine which ones work the best. We will also compare the perfor- 
mance of the DFMerge and BFMerge algorithms so as to assess the efficiency of 
computation of these algorithms. 

5.1.1 Cost Estimation for Code Call Conditions. In this section, we describe how 
to estimate the cost of merged code call conditions for a set C = {ccci, ccc2, cccn} 
of code call conditions. We assume that there is a cost model that can assess costs 
of individual code call conditions. Such costing mechanisms have been already 
developed for heterogeneous sources by Du ct al. 1992 ; Adali et al. 1996| ; Naacke 



et al. 1998; Roth ct al. 1999 1. Using this, we may state the cost of a single code 



call condition. 

Definition 5.13 (Single Code Call Condition Cost). The cost of a code call con- 
dition ccc is defined as: cost{ccc) = eccc where cost{xi) is the cost of 
executing the atomic or equality /inequality code call condition Xi- Note that the cost 
of Xi may include a variety of parameters such as disk/buffer read time, network 
retrieval time, network delays, etc. 

Wc may now extend this definition to describe the coalesced cost of executing 
two code call conditions ccck and cccfc+i. 

Definition 5.14 (Coalesced cost). The coalesced cost of executing code call condi- 
tions ccCk and ccck+i by exploiting equivalent, implied and overlapped sub-code call 
conditions of ccCk and ccck+i is defined as: 

coalesced_cost{ccck, ccck+i) = cost{ccck) + cost{ccck+i) — gain{ccck , ccck+i) 

where gain{ccck, ccck+i) is the cost of the savings obtained by sharing sub-code call 
conditions between ccCk and ccck+i. 



We are now left with the problem of defining the concept of gain used above. 
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Definition 5.15 (Gain of two sub-ccc's). Suppose Xi andxj are sub- code call con- 
ditions in ccck and cccu+i) respectively, andX is a set of invariants. Then, the gain 
of executing Xi i Xj defined as: 



9ain{xi,Xj) 



cost{xi} ifI\=Xi=Xj 
cost{xi) - cost{eval{xi, Xj)) ifl 1= Xi ^ Xj 

expk ifI\=Xi^ Xj ararf X^Xi ^ Xk and 

^hxj ^ Xk 

otherwise 



where expk ^ cost{xi)+cost{xj)-cost{xk)- cost{eval{xi ,Xk))- cost{eval (Xj , Xfe ) ) 
and eval{xi i Xj ) *s the task of executing code call condition Xi by using the results 
of code call condition Xj ■ 

An explanation of the above definition is important. If code call conditions 
Xi and Xj are equivalent, then we only need to execute one of them, leading to 
a saving of cost(xi)- If Xi Xj (i-e- Xj's answers include those of Xi) then 
we can first evaluate Xji and then select the answers of Xi from the answers to 
Xj. A third possibility is that Xi and Xj overlap, and there exists a code call 
condition Xk such that Xk is implied by both Xi-, Xj- In this case, we can compute 
Xk first, and then use the result to select the answers of Xi^Xj- The cost of this is 
cost(xfe) + cost(evaJ(xj, Xfe)) + cost(evaJ(xj, Xfc))- As the cost of executing Xi, Xj 
sequentially is cost(xi) + cost(xj), the gain is computed by taking the difference, 
leading to the third expression. We now define the gain for two code call conditions 
in terms of the gains of their sub-code call conditions involved. 

Definition 5.16 (Gain of two code call conditions). The gain for ccCk and ccci^^i 
is defined as: 

gain{ccck,ccck+i) = ^ gain{xi,Xj)- 

Example 5.17. Consider the following code call conditions: 



Xi 

X2 
X3 



in(FinanceRec, rel : s elect (financeRel, sales, ">", 20K)), 
in(C, excel; c/iart(C, FinanceRec, day)), 
in(FinaiiceRec, rel: select (financeRel, sales, ">", lOK)) 



Let ccci = xi & X2 and ccc2 = X3- It is evident that ANS(xi) ^ ANS(x3)- Suppose 
fm-ther that the costs of these code call conditions are given as: cost (xi) = 25, 
cost (X2) = 10, cost (xs) = 15 and cost (evai (xi: X3))=10- Then, gain {ccci,ccc2) 
= J2x,eccci,x,i£ccc2 gain(Xj,Xj) ^ gain (xi,X3) because gain (x2,X3)=0, as there 
is no relation between code call conditions X2 and X3- As xi ^ X3> gain (xi:X3) 
= cost (xi)- cost {eval (xiiX3))= 25 - 10 =15. Then, the coalesced cost of 
ccci and ccc2 is given by, coalesced_cost(ccci, CCC2) = cost(ccci) + cost(ccc2) — 
gain(ccci,ccc2) = (25 + 10) + 15 - 15 = 35. 

5.1.2 Merging Code Call Conditions. We now develop two algorithms that pro- 
duce a global merged code call evaluation graph for a set, C = {ccci, ccc2, ccc„} 
of code call conditions. These algorithms use the cceg representation of each code 
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call condition ccci, and merge two graphs at a time until all graphs are merged into 
a final global code call evaluation graph. They make use of a merge routine which 
merges code call evaluation graphs by using the Eq, I and O sets generated by the 
Improved-CSI algorithm. 

While merging two code call evaluation graphs, the merge routine may need 
to delete some nodes from the ccegs. Recall that in a cceg, a node represents 
an atomic/ (in)equality code call condition. The following procedure is applied 
recursively to delete a node Xi from a code call evaluation graph: 

(1) First the node Xi is removed. 

(2) Then all incoming edges {xj,Xi) all outgoing edges {xi,Xi) deleted. 

(3) If any of the nodes Xj-, encountered in the previous step, has no outgoing edges, 
then node Xj is also deleted recursively. 

The merge routine uses a set of three transformations which we define now. 
The first transformation takes a set of graphs of equivalent code call conditions and 
creates a single graph. 

Definition 5. 18 (Equivalence Transformation, Tl). LetC = {C'l, C2, ■ ■ ■ ,C'm}be 
of code call conditions. Let CCEQ = {cceg{Ci), cceg{C2), ■ ■ ■ ,cceg{Cm)} be their 
code call evaluation graphs. Given a set Eq of equivalent code call conditions, which 
are sub-cccs of the Ci 's, the equivalence transformation Tl is defined as follows: 

Tl:cceg(C) = (Ui<,<™ V,, Ui<,<™ E,) 

Eq' ■■= {(XiiXj) \{Xi^X]) G Eq and ${xi,x'j) S Eq such that Xi is a sub-ccc 
of Xi o,nd Xj is a sub- ccc of x'j } 

for all {x^,X]) e Eq' do 

if gain(xi,X]) > then 

delete all the nodes corresponding to atomic cccs in Xi 

from cceg(C) recursively 

delete all outgoing edges {xiiXk) G cceg(C) 

create the edges {xjiXk) G cceg (C) 

The second transformation (T2) below takes a set of graphs of code call condi- 
tions, together with a set of known implications between sub-code call conditions of 
these code call conditions. Using these known implications, it merges these graphs 
into one. 

Definition 5.19 (Implication Transformation, T2). LetC ={Ci,C2,... ,Cm} be 
of code call conditions. Let CCEQ = {cceg{Ci), cceg{C2), ■ . ■ ,cceg(Cm)} be their 
code call evaluation graphs. Given a set I of implied code call conditions, which are 
sub-cccs of the Ci 's, the implication transformation T2 is defined as follows: 

T2:cceg(C) = (Ui<,<„ V,, Ui<,<™ E,) 

I' •■= {(Xi:Xi) l(Xi,Xi) S I and ${Xi,x'j) G I such that Xt is a sub-ccc 
of Xi and Xj is a sub- ccc of x'j } 



for all iXi^Xj) el' do 

if gain(xi,Xj) > then 
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delete all incoming edges {xi,Xi) ^ cceg(C) to Xi recursively 
create the edge (xj^Xi) ^ cceg(C) 
set cost (xi) to cost (cval (xiiXj)) 

The third transformation (T3) bcfow takes a set of graphs of code cah conditions, 
together with a set of known overlaps between sub-code caU conditions of these code 
cah conditions. Using these known overlaps, it merges these graphs into one. 

Definition 5.20 (Overlap Transformation, T3). We consider the set C ~ {Ci,C'2, ■■ ■ ,0^} 
of code call conditions. Let CCS Q = {cceg(Ci), cceg(C2), . . . ,cceg{Cm)} be their 
code call evaluation graphs. Given a set O of overlapping code call conditions, which 
are sub-cccs of Ci 's, the overlap transformation T3 is defined as follows: 

T3:cceg(C) = (Ui<.<™ V,, Ui<,<™ E.,) 

O' ■■= {{Xi,X],Xk) \{Xi,X]) e O and ${x'i,x'j^x'k) ^ O such that Xt is a sub-ccc 
of Xi o-rid Xj is a sub- ccc of Xj } 

for all ix^,X3) &0' do 

if gain(xi,Xj) > then 

create a node Xk G cceg(C) 

create edges {xk.Xi) S cceg(C) and {xk,Xj) ^ cceg(C) 
delete all incoming edges {xhXi) G ccegfCj to Xi recursively 
delete all incoming edges {Xm,Xj) G (^ceg(C) to Xj recursively 
create edges (xhXk) G cceg(C) and (Xm,Xfe) € cceg(C) 
set cost (xi) to cost (eval (xi,Xk)) 
set cost (xj) to cost (eval (xjiXk)) 

The merge routine merely applies the above three transformations sequentially 
in the order T1),(T2),(T3). 

Definition 5.21 (The Merge Routine). The merge routine takes as input a set 
of code call evaluation graphs, and the sets of equivalent, implied and overlapped 
sub-code call conditions, and uses Tl, T2 and T3 to produce a single code call 
evaluation graph. It is given by the following: 

merge(CC£g, Eq,I,0) T3(T2{Tl(CC£g , Eq),I),0). 

The merge routine works as follows: First, it gets the sets of equivalent, im- 
plied and overlapped sub-code call conditions from the Improved-CSI algorithm. 
Then, it applies the merge-transformations in the order: Tl, T2, T3. The intu- 
ition behind this order is the fact that the maximum gain is obtained by merging 
equivalent code call conditions. 

The merge routine can be utilized with any search paradigm (e.g. depth-first 
search, dynamic programming, etc.) to obtain an algorithm which creates a "global" 
code call evaluation graph. In Figures m and we provide two algorithms that 
use the merge routine to create a global code call evaluation graph. Both algo- 
rithms merge two graphs at a time until a single graph is obtained. The DFMerge 
algorithm starts with the empty graph, and chooses the next "best" code call eval- 
uation graph to merge with the current global code call evaluation graph. This 
process is iteratively executed. On the other hand, the BFMerge algorithm picks 
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the "best" pair of code call evaluation graphs to merge from the ToDo list, which 
initially contains all code call evaluation graphs. Upon merging, the merged code 
call evaluation graph replaces the two code call evaluation graphs being merged. 
This process is executed itcratively till only one code call evaluation graph remains 
in the ToDo hst. 



DFMerge(CCf e, Eq, I, O) 

/* Input: CCEQ = {ccegi, ...,ccegn} */ 

/* Output: a global cceg */ 

ToDo := cceg; 

currentGraph := selectNext (ToDo, NIL); 
delete currentGraph from ToDo; 
while {ToDo is not empty) do 

nextGraph := selectNext (To_Do, currentGraph); 

delete nextGraph from ToDo; 

currentGraph := merge({ctirrentGrap/i, nextGraph}, Eq, 7, O); 
Return currentGraph; 
End- Algorithm 



Fig. 11. DFMerge Algorithm 



BFMerge(CC£'g, Eq, I, O) 

I* Input: CCEQ = {cceg\, ...,ccegrC\ */ 

I* Output: a global cceg */ 

ToDo := CCEQ; 

while (card(ToDo) > 1 ) do 

{ccegi,ccegj) := selectNextPair (ToDo); 

delete ccegi, ccegj from ToDo; 

newGraph := nierge{{ccegi, ccegj}, Eq, I , O); 

insert newGraph into ToDo; 
Return ccegi G ToDo; 
End- Algorithm 



Fig. 12. BFMerge Algorithms 



The success of both the DFMerge and BFMerge algorithms depends very 
much on how the next "best" merge candidates) are selected. Below, we present 
three alternative strategies for doing this which we have used in our experiments. 

Strategy 1: 

DFMerge: Choose the graph which has the largest number of equivalent code 
call conditions with the currentGraph. 

BFMerge: Choose a pair of graphs which have the largest number of equiva- 
lent code call conditions. 
Strategy 2: 
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DFMerge: Choose the graph which has the largest number of equivalent, im- 
plied or overlapped code call conditions in common with the currentGraph. 

BFMerge: Choose a pair of graphs which have the largest number of equiva- 
lent, implied or overlapped code call conditions between the two of them. 

Strategy 3: 

DFMerge: Choose the graph which leads to the greatest gain with the the 
currentGraph. 

BFMerge: Choose the pair of graphs the associated gain of which is maximal. 

5.1.3 Executing The Global CGEG. The final problem that needs to be addressed 
is to find an execution order for the global code call evaluation graph. Any topo- 
logical sort of the global cceg is a valid execution order. However, there might 
be several topological sorts that can be obtained from the global cceg, and some 
of them might be preferable to others. For example, a topological sort that gives 
preferences to certain nodes, i.e. outputs them earlier in the sequence, might be de- 
sirable. In order to find such an execution order, we compute weights for topological 
sorts. 

Definition 5.22 (Weight of a topological sort). Let t: be a topological sort, and 
weight„(^.ij be the weight of the ith node in the topological sort. If we have n total 
nodes, the weight of tt, denoted by weight{Tr), is given by 



Any topological sort that minimizes weight{T:) gives a desirable execution order. 
Besides, we can implement various strategies with this function simply by assigning 
weights accordingly. For example, if we want to favor nodes that output results, 
we can assign larger weights to such nodes. In order to find the topological sort 
with the minimum wcight{T:), we use a modified topological sort algorithm which 
is given in Figure n3. 



/* Output: a topological sort tt that minimizes weightln) */ 
D := {v \ V has indegree 0} 
while D is not empty do 

v' := node with the heighest weight in D, 

output v' , 

remove v' from D, 

delete all outgoing edges of v' , 

D := D U {v \ V has in-degree 0, v ^ D} 
End- Algorithm 



n 




FindExecutionOrder (cceg) 
/* Input: global cceg 



*/ 



Fig. 13. Modified Topological Sort Algorithm That Finds the Minimal wcight{TT) 
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6. EXPERIMENTS 

We ran various sets of experiments on a Sun Ultral machine with 320 MB memory 
running Solaris 2.6. In the first set of these experiments, we study the execution 
time of the Create-cceg algorithm. Specifically, we evaluate the performance of 
this algorithm with varying number of dependencies and conjunctions in the code 
call conditions. In the second set of the experiments, we study the execution time of 
the development phase component. In particular, we study the trade-offs involved 
in the generic Chkjmp Algorithm. In the last set of experiments, we demonstrate 
the efficiency of the Improved-CSI Algorithm, as well as the merging algorithms. 
We compare the performan ce of the m erging algorithms (with different strategies) 
with the A* algorithm of [ Sellis 1988 |. Our implementation of the development 



phase and deployment phase components involved over 9, 500 lines of C++ code. 
6.1 Performance Evaluation of the Create-cceg Algorithm 

To evaluate the performance of the Create-cceg algorithm, we generated several 
code call conditions, with varying number of conjuncts and number of dependencies. 
In the first set of experiments, we kept the number of dependencies constant and 
varied the number of conjuncts from 5 to 40. We repeated the same experiments 
when 10, 15, 20 and 25 dependencies are present. For each combination of number 
of dependencies and conjuncts, we created 500 code call conditions and recorded 
the average running time. Figure [l^ shows the results. As seen from the figure, the 
execution time increases linearly with the number of conjuncts. The Create-cceg 
algorithm is extremely fast, taking only 14 milliseconds for code call conditions 
involving 40 conjunctions and 25 dependencies. 



Execution Time ol Creale-CCEG Algorillnm (fixed no of d( 
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o of dependencies ^f5 
o of dependenciSfi^ 20 -Q-- 
o of depen^ncies = 25 



I ' ' ' ' ' ' 1 
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number of conjuncis 



Fig. 14. Execution Time of Create-cceg (constant number of dependencies) 



In the second, set of experiments, we kept the number of conjuncts constant, 
and varied the number of dependencies from 10 to 50. We ran four experiments 
with 10, 20, 30 and 40 number of conjuncts. Again, we generated 500 code call 
conditions for each combination and used the average running time. The results are 
given in Figure [l^. Again, the execution time increases linearly with the number 
of dependencies. 
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Execution Time of Create-GGEG Algorithm (fixed no of conjuncts) 
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Fig. 15. Execution Time of Create-cceg (constant number of conjuncts) 

6.2 Performance Evaluation of the Development Phase Component 

In order to evaluate the performance of the development phase component, we con- 
ducted a set of experiments which use the Chk_Ent algorithm described in Sec- 
tion^ and different instances of the Chk_Imp Algorithm. We varied the threshold 
and the Axiomatic Inference System used by Chk_Imp. The instances we 
used arc described in Tabic As the instance number increases, the complexity of 
the Chk_Imp Algorithm also increases. 




Instance 


Threshold 


Axiomatic Inference System 


Instance 


00 


X ^ X 

xnx' ^ X 
X ^ xux' 




Instance i 


i 


All rules in Appendix 




Instance uj 


oo 


All rules in Appendix 





Table 5. Instances of the Chkjmp Algorithm 



We ran a set of experiments with two different data sets, namely spatial domain 
invariants and the relational domain invariants, which are given in Appendix For 
each instance of the algorithm we ran the development phase component several 
times until we get an accuracy of 3%, with 3% confidence interval. Figure shows 
the execution time of the Compute-Derived-Invariants algorithm for these two 
data sets. As the only difference is the ChkJmp Algorithm instance employed, 
the x-axis is labeled with those instances. 

Note that the x-axis used a logarithmic-scale and hence, we may conclude that 
execution time increases linearly with the instance number, until instance 4096, and 
increases exponentially after that. However, we have observed that all instances 
starting from instance 4, produced the same final set of Derived Invariants, 
18 invariants for the spatial domain, and 15 invariants for the relational domain. 
For the relational domain invariants, the execution-time increases more rapidly 
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than the spatial case. The observed time increase is due to the time spent in 
detecting failure. Memory overflows prevented us from running experiments with 
larger threshold values. 



Execution Time of Deveiopmeni 




Fig. 16. Execution Time of Compute-Derived-Invariants 



6.3 Performance Evaluation of the Deployment Phase Component 

For the performance evaluation of the deployment phase component, we ran exper- 
iments to evaluate both the execution times of the merging algorithms and the net 
savings obtained by the algorithms. Wc will describe the experimental setting in 
detail in the following: 

In the experiments we assume a hdb agent that accesses relational and spatial 
(PR-quadtree) data sources. We have built cost estimation modules for the se two 



sources where the cost calculations are similar to those of [ Balzbcrg 198g |. We 
also built an agent cost module which coordinates with the above two modules to 
estimate the cost of a code call condition. The individual cost estimation modules 
report the cost and the cardinality of their code call conditions to the agent cost 
module. The agent cost model also includes network costs. For the experiments, 
it is assumed that the data sources and the agent are on a fast Ethernet LAN. We 
created a synthetic database schema given below, and used the cost estimates in 
the experiments. 

supplier (sname, pno, quantity) 
product (pno, price, color) 
map (name, x-location, y-location) 
purchases (customerjiame , pno) 

We used the ccc templates given in Table ^ in the experiments. In Table 
Op = {<,=,>}. Note that the last entry in Table ^ involves only relational data 
sources. 

By changing constants in these template code call conditions, we have created 
various commonality relationships. We have constructed the following three types 
of code call condition sets by using the above templates. 
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Code Call Condition Template 

in(Tl,rdb1 : seieci2(supplier, pno, " = ", Vail, qty, Opl, Val2)) & 
in(P, quadtree : range('map, X, Y, Rad)) & =(Tl.sname, P.name) & 

in(T2, Tdb2: seiect(product, price, 0p2, Val3)) & =(Tl.pno, T2.pno) 
in(Tl,Tdb1 : seiect2(supplier, pno, " = " , Vail, qty, Opl, Val2)) & 
in{P, quadtree: range{map,X, Y,Rad)) & =(Tl.sname, P.name) & 

in(T2, Tdb2: ™gse/ect(product, price, Val3,Val4)) & =(Tl.pno, T2.pno) 
in(Tl, Tdbl : rngselect{supplier, qty. Vail, Val2)) & =(Tl.pno, Val3) 

& in(P, quadtree : rart(je(map, X, Y, Had)) & =(Tl.sname, P.name) & 
in(T2, rdb2: seieci(product, price, 0p2, Val3)) & =(Tl.pno, T2.pno) 
in(Tl, rdbl : rngselect{supplier, qty. Vail, Val2)) & =(Tl.pno, Val3) & 
in(T2, rdb2: seiect(product, price, 0p2, Val4)) & =(Tl.pno, T2.pno) & 
in(T3, rdb3 : rn3se/ect(purchases, pno, Val5, Val6)) & =(Tl.pno, TS.pno) 



Table 6. Query Templates Used in the Experiments 

Type 1: Such sets of code call conditions only contain equivalent code call con- 
ditions. 

Type 2: Such sets of code call conditions only contain both equivalent and implied 
code call conditions. 

Type 3: Such sets of code call conditions contain equivalent, implied and overlap- 
ping code call conditions. 

Before describing the experiments, let us first define the metrics we use in these 
sets of experiments. 

Definition 6.1 (Savings Percentage). LetC_cost be the initial total cost of the set 
of code call conditions, i. e., the sum of the individual code call condition costs, 
Rn^cost he the cost of the global merged code call condition produced by the merg- 
ing algorithm, IdCom_cost be the execution time of the Improved- CSI algorithm 
and Merge-cost be the execution time of the merge algorithm employed. Then, the 
savings percentage achieved by the merge algorithm is given by: 

C-cost — Bn.cost — IdCom.cost — Merge-cost 

savings percentage = 

C-Cost 

We try to capture the net benefit of merging the code call conditions with the 
savings percentage metric. Moreover, in order to remedy the difference between 
high-cost code call conditions and low-cost code call conditions, we normalize the 
savings percentage metric. 

Definition 6.2 (Sharing factor). Let C — {Ci, .., Cat} he the set of given code call 
conditions. Let [xi]^ [Xm] be equivalence relations, where each [xi] contains a set 
of equivalent code call conditions and card{[xi]) > 2, i = l,..,m. Let I = {xi 
such that Xi ^ [Xj]j '"^'^ thcrc exists at least one Xk, such that Xi ^ Xk}- 

And finally let = {{Xi,Xj,Xk) \ such that Xi,Xj ^ [Xk],k = l,..,m,and Xi,Xj 4- 
I,and Xz < — > XjiXk ^ Xtand Xk ^Xj}- 

Then, the sharing factor of this set of code call conditions is given by: 

YlT=i card{[xz]) * card{xt) + Ey.gj card(x») + T,(x„Xj,Xk)eO card{xk) 
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The sharing factor basicaUy gives the percentage of data objects shared among 
code cah conditions. The intuition behind this metric is that we expect to see 
an increasing benefit from merging as the sharing among the code call conditions 
increases. In this metric, we try to avoid counting the cardinality of any code call 
condition more than once, so that the sharing factor is between 0% and 100%. 
In order to compare our algorithms with a well known algorithm | ^him ct al 



1994] for merging multiple relational database only queries using the A* algorithm, 
we implemented an adapted version of the A* of phim ct al.T99^ . We used 



an improved version of their heuristic function. We adapted our Improved-CSI 
algorithm to work with the A* algorithm. We enumerated the most promising 8 
execution plans for each individual ccc and input those plans to the A* algorithm. 

[ 3cllis and Ghosh 1990 | also uses similar measures. In their case, they only have 
equivalent relationships, hence the sharing factor metric is trivially calculated. In 
their version of the savings percentage metric, they only measure the difference 
between initial cost and the final cost obtained by merging, and fail to take into 
account the cost of achieving that savings. Our experiments show that although 
the A* algorithm finds better global results, the cost of obtaining those results is 
so prohibitively high that the A* algorithm is often infeasible to use in practice. 

In all of the experiments, the algorithms arc run several times to obtain results 
that are accurate within plus or minus 3%, with a 3% confidence interval. 



Execution Time of identilyCommonCGGs Algorithm 




6 18 20 



Fig. 17. Execution Time of of Improved-CSI 



6.3.1 The Execution Time of the Improved-CSI Algorithm. The Improved- 
CSI algorithm has been ran with the three types of ccc sets. Figure ^ shows the 
execution times of the algorithm as the number of ccc's in the set increases. As seen 
from the figure, although the execution time is exponential with a small slope, it is 
in the order of seconds. It takes only 6 seconds for the Improved-CSI algorithm 
to find all relationships in a set containing 20 queries. Moreover, the execution 
time increases as more types of relationships exist in the ccc sets. It has the highest 
execution time for Type 3 ccc sets, and the lowest execution time for Type 1 ccc 
sets. 
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6.3.2 Savings Achieved by the Merge Algorithms. In these experiments, we inves- 
tigate the net savings the merge algorithms achieve for our three different types of 
ccc sets, as well as for ccc sets involving only relational sources. We have 10 ccc's 
in each set. The reason for this is that the A* algorithm exhausts memory for ccc 
sets having more than 11-12 ccc's. 

Figures |l^, ^ and ^ show the savings percentage achieved for Type 1 , 2 and 
3 ccc sets, respectively. As seen from Figure |l^, the A* algorithm performs as 
well as our merge algorithms once the sharing factor exceeds approximately 30%. 
We have not been able to run the A* algorithm for low sharing factors because 
of the memory problem. The A* algorithm has an effective heuristic function for 
equivalent ccc's, hence it is able to obtain high quality plans in a very short time. 
However, as seen from the figure, our merge algorithms are also able to achieve the 
same level of savings. 




Fig. 18. Net savings achieved with Typo 1 ccc Sets 
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Fig. 19. Net savings achieved with Type 2 ccc Sets 
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Figure shows the results when there are both equivalent and implied ccc 
relationships. This time the heuristic function of the A* algorithm is not as effective 
as with Type 1 ccc sets, and the net savings it achieves are negative until very high 
sharing factors. Although the A* algorithm finds low cost global execution plans, 
the execution time of the algorithm is so high that the net savings are negative. 
Our merge algorithms achieve very good net savings percentages. All the selection 
strategies perform almost equally well, with BFMergeS performing slightly better. 

Figure ^ shows the net savings obtained when all three types of relationships 
exist in the ccc sets. Note that the A* algorithm only considers equivalent and 
implied relationships. The results are very similar to the previous experiment. 
Again, our merge algorithms perform much better than the A* algorithm. Our 
different select strategies have similar performances, with BFMergeS performing 
the best. 

As the A* algorithm was devised only for relational data sources, we designed 
another experiment involving only relational data sources. In this type of ccc sets, 
we only allowed equivalent relationships, as the A* algorithm performed best with 
equivalent ccc's. Figure |l] shows the net savings achieved in this case. As seen 
from the figure, our algorithms perform as well as the A* algorithm for sharing 
factor greater than 30%, and better for the rest. 
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Fig. 20. Net savings achieved with Type 3 ccc Sets 




These results suggest that although our algorithms explore a smaller search space 
with respect to the A* algorithm, the savings we obtain in practice are as good 
as that of the A* algorithm, and the high execution cost of the A* algorithm is 
prohibitive. 

6.3.3 Execution Times of Merge Algorithms. In these experiments, we studied 
the execution times of our Merge algorithms and the A* algorithm. Figures ^ 
and |2j show the execution times for Type 1, Type 2 and Type 3 ccc sets as the 
number of ccc's in the sets increases. Note that the y-axes in the figures 
have logarithmic scale. As seen from the figures, the A* algorithm has double- 
exponential execution time, and it cannot handle ccc sets having more than 10-11 
ccc's, as it exhausts memory. The results show that our algorithms run (1) 1300 
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Savings Percenlage Achieved 
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Fig. 21. Net savings achieved with relational sources 



Execution Time of the Aigortihms with Type 1 GCC Sets 




Fig. 22. Execution Time of Merge Algorithms with Type 1 ccc Sets 



to 5290 times faster than the A* algorithm for Type 1 ccc sets, (2) 1360 to 6280 
times faster than the A* algorithm for Type 2 ccc sets, and (3) 100 to 350 times 
faster than the A* algorithm for Type 3 ccc sets. 

The execution times of our Merge algorithms are exponential, but in the order 
of milliseconds, taking less than a second for even 20 ccc's. Among our algorithms, 
BFMergeS has the highest execution time, as it uses an expensive heuristic and ex- 
plores a relatively larger search space than the DFMerge algorithms. DFMergeS 
has the next highest execution time, and DFMergel has the lowest. One impor- 
tant observation is that although BFMergeS and DFMergeS use a relatively 
expensive and more informed heuristic, and therefore have higher execution times, 
and find better global execution plans, they achieve the same level of net savings 
with the other strategies. Hence, the increased cost induced by these two strategies 
are not offset by the net savings they achieve. 



6.3.4 Final Cost of Plans Generated by the Merge Algorithms. As the A* algo- 
rithm examines an exhaustive search space, we studied the quality of plans gener- 
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Execution Time the Algorithms with Type 2 CCG Sets 




Fig. 23. Execution Time of Merge Algorithms with Type 2 ccc Sets 



Exeoution Time ot the Algortihms with Type 3 CCC 




Fig. 24. Execution Time of Merge Algorithms with Type 3 ccc Sets 

ated by our Merge algorithms and the A* algorithm to determine how suboptimal 
our final plans are. For this purpose, we examined the final costs of the plans for 
our Type 3 ccc sets. Figure ^ shows the estimated execution costs of the final plans 
generated by the Merge algorithms. As seen from the figure, the A* algorithm 
almost always finds better plans than our algorithms. However, the time it spends 
in finding those quality plans is not offset by the net savings it achieves. Although 
our algorithms explore only a restricted search space, the results show that they are 
able to compute plans whose costs are at most 10% more than the plans produced 
by the A* algorithm. From these results, we can conclude that our algorithms are 
both feasible and practical. 



7. RELATED WORK 

Our work has been influenced by and is related to various areas of research. Over the 
last decade, there has been increasing interest in building information agents that 
can access a set of diverse data sources. These systems include HERMES |Adah 
ct al. 1996[ , SchcmaSQL jLakshmanan ct al. 199^ ; [Lakshmanan ct al. 1999[ , TSIM- 
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Fig. 25. Final Plan Costs Generated for Type 3 ccc Sets 



MIS IChawathc ct al. 1994|; |Garcia-Molina et al. 19971, SIM S jArens et al. 1993|] 
Information Manifold [Levy et al. 1996b ; Levy et al. 1996a |, The Inter net Softbot 



[ Etzioni and Weld 1994 1, Inf oSleuth [Bayardo ct aL 1997 1, Infomaster [ Gencsereth 



et aL 1997|, and ARIADNE [Ambite et aL 1998|. Although all these systems pro- 



vide mechanisms to optimize individual requests, the only one which addresses the 
problem of optimizing overall agent performance is ARIADNE. 

In [Ashish 1998; Ashish et al. 199£], the authors propose techniques to selectively 
materia lize data to impro ve the performance of subsequent requests. They use the 
LOOM [ MacGrcgor 199C ] knowledge representation language for modeling data and 
maintain an ontology of classes of information sources in LOOM. They determine 
what to materialize by examining previous user requests as follows. They first look 
at the constraints imposed by user queries and create subclasses in the ontology 
corresponding to these restrictions. They then try to merge subclasses whenever 
possible. After all user queries have been examined, they sort these subclasses 
according to the frequency of requests and materialize subclasses from this list 
until the space reserved for materialization is exhausted. They repeat this p rocess 
in fixed intervals. Their idea is similar to previous semantic caching ideas [Adali 
and Subrahmanian 1195; Adali et al. 1996; Dar et al. 1996 1. In semantic caching, 



the cache is organized into semantic regions instead of pages. When a new query 
arrives, the contents of the cache is examined to determine what portion of the 
data requested in the query is present in the cache. A query is then created to 
retrieve the rest of the data from disk. The problem with semantic caching is that 
containment checking is hard and having a large number of semantic regions creates 
performance problems. 

Both |Ashish et al. 1999| and |Dar et al. 1996] process one query at a time, 
and try to reduce the execution time by using caches, whereas we examine a set of 
requests (in our framework, agents can be built on top of legacy software code bases 
such as PowerPoint, Excel, route planners, etc. which may not support a database 
style query language) and try to optimize the overall execution time of this set of 
requests by exploiting the commonalities between them. Since we process a set 
of requests simultaneously, we cache the results of a code call condition evaluation 
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only if another code eall condition in this set can make use of the cached results. On 



the other hand, in [Ashish ct al. 1999] caching decisions are based on user request 
histories. The advantage of their approach is that they can make use of the cache 
for a longer period of time, while in our case the cache contents are valid during 
the execution of a particular set of code call conditions. When we process the next 
batch of code call conditions, wc discard the contents of the cache. On the other 
hand, the disadvantage of history based caching is that it cannot rapidly adapt to 
changes in interests. Nevertheless, we believe that incorporating more global level 



caching techniques, like the ones in | Ashish ct al. 19991, into our framework is a 



promising research area that is worth pursuing. Another important difference is 
that our results also include soundness and completeness theorems. 

The problem of simultaneously optimizing and merging a set of queries has 



been studied within the context of relational and deductive databases |Grant and 


Minkcr 1980; 


ScUis 1988; Shim ct al. 1994; Scllis and Ghosh 1990; Finkclstcin 1982; 


Chakravarthy and Minkcr 1985]. |Grant and Minkcr 1980; sellis 1988; scUis and 


Ghosh 1990; 


Shim ct al. 1994 1 address the problem of creating a globally opti- 



mal access plan for a set of queries, provided that the common expressions among 
the queries are given as input. [Grant and Minker 1980| describe a branch- and- 
bound algorithm which searches a state space in a depth-first manner to optimize 
a set of relational expressions. Their algorithms are not cost-based, and hence they 
may increase the total execution cost of the queries. Moreover, they only consider 
equivalence relationships, but not containment relationships and they only deal 
with relational sources. Furthermore, they do not deal with non database sources. 



[Scllis 1988; 3him ct al. 1994; Scllis and Ghosh 1990[ propose exhaustive algo- 
rithms to create a globally optimal execution plan for a set of relational database 
queries. [ Sellis and Ghosh 1990{ show that the multiple-query optimization (MQO) 
problem in relational databases is NP-hard even when only equivalence relationships 
arc considered. Hence, exact algorithms for MQO are not practical and therefore, 
approximations or heuristic algorithms are worth pursuing. 

formulates the MQO problem as a state search problem and uses the 



[Sellis 



A* algorithm. In their approach, a siaie is defined as an n-tuple ( Pijj,P2j2,--Pnj„), 
where Pij^ E {NULL} U Pi and Pi is the set of possible access plans for query 
Qi. The initial state is the vector ( NULL, . . . , NULL ), that is no access plan 
is chosen for any query. A state transition chooses an access plan for the next 
query whose corresponding access plan is NULL in the state vector. The heuristic 
function proposed by Scllis 198^ takes only equivalence relationships into account. 
plum ct airT994 | improves and extends this heuristic function by incorporating 
implication relationships and by modifying the estimated costs. This improved 
heuristic function provides a tighter bound than the one proposed in [ Scllis 198^ . 

However, their approach requires enumeration of all possible plans for each query, 
leading to a (theoretically) very large search space. As a result, these algorithms 
have an exponential worst case running time. Moreover, in a heterogeneous envi- 
ronment, it may not be possible to assume that all query plans can be enumerated 
since queries might have infinitely many access plans. Furthermore, application 
program interfaces of individual data sources and/or software packages may not 
enumerate all such plans for requests shipped to them. This may be because (i) 
their internal code does not support it, or (ii) they arc not willing to do so. 
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While ]Sellis 1988| ; ^him ct al. 1994| ; [Scllis and Ghosh 1990[ focus on only rela- 
tional data sources, we address the problem of optimizing a set of code call condi- 
tions in agents which are built on top of arbitrary data sources. For this purpose, 
we provide a framework to define and identify common subexpressions for arbitrary 
data sources. Moreover, we do not need to enumerate all possible plans of a single 
query. We have implemented an adapted version of the A* algorithm of |Shim et al. 
1994] and compared it with our merging algorithms. As the results in Section |6 
show, our merging algorithms are much faster than the A*-based algorithm. As the 
A*-based algorithm examines a larger search space, it may find low-cost plans that 
our merging algorithms may miss. However, the time it takes to find such good 
plans is usually not offset by the savings it achieves. 

[Finkelstein 1982; Chakravarthy and Minker 1985|, on the other hand, focus on 
detecting common expressions among a set of queries in relational and deductive 
databases. Since the notion of "common subexpression" varies for different data 
sources, the common expression identification problem for agents is very different 
from those of relational and deductive databases. Furthermore, they only consider 
equivalence and containment relationships among queries when detecting common 
subexpressions, whereas we also consider overlapping cases. 

The only work t hat addresses heterogeneity and hence is most closely related 
to ours is that of jSubramanian and Venkataraman 1998|. The authors propose 
an architecture to process complex decision support queries that access to a set of 
heterogeneous data sources. They introduce transient views, which are materialized 



views that exist during the execution of a query. |Subramanian and Venkataraman 



1998] describe algorithms which analyze the query plan generated by an optimizer 



to identify similar sub-plans, combine them into transient views and insert filters 
for compensation. Moreover, ]Subramanian and Venkataraman 1998] presenst the 



implementation of their algorithms within the context of DataJoiner's [Gupta and 



Lin 1994; Venkataraman and Zhang 1998] query optimizer. They try to optimize 



a complex decision support query by exploiting common subexpressions within 
this single query, whereas we try to simultaneously optimize a given set of requests. 
While they examine relational-style operators in detecting common subexpressions, 
we process any code call condition defined over arbitrary data sources not just re- 
lational sources. Moreover, they do not have a language to describe equivalence 
and containment relationships for heterogeneous data sources and hence these re- 
lationships are fixed apriori in the optimizer code. On the other hand, we provide 
invariants to describe relationships for heterogeneous data sources. Our algorithms 
for merging multiple code call conditions take such invariants and cost information 
into account when performing the merge. 

Another area of research that is related to ours is partial evaluation in logic 



programs [Leuschel ct al. 1998; Lloyd and Shepherdson 1991; De Schreye et al 



1999]. Partial evaluation takes a program and a goal and rewrites the program by 



using a set of transformations to optimize its performance. The rewritten program 
usually runs faster for the particular goal when SLD or SLD-NF resolution is used 
for query processing. On the other hand, our framework takes an agent program 
and a set of derived invariants, and tries to optimize the agent program apriori, 
that is at development time, prior to occurence of state changes. An interesting 
research problem in our framework may be the following: If a state change can be 
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encoded as a goal, then we can use partial evaluation techniques to further optimize 
the rewritten agent program, as shown in Figure ^ We believe this problem needs 
further attention and research. 




Fig. 26. Application of Partial Evaluation Techniques to Agent Programs 



Another area of research that is very closely related to ours is query optimization 



in relational and deductive databases fGracfc 1995 




Baas et al. 1989; [oannidis and 


Kang 1990; Graefe 1993; Ibaraki and Kameda 1984 




Kim 1982; Mumick et al. 1996|], 


and in mediators | 


Adah et al. 1996; Levy et al. 1996a; Haas et al. 1997; Ambite 


and Knoblock 2000 


; Duschka et al. 2000|] . It is worth noting that this list is not ex- 



haustive since over the last decades, enormous effort has been devoted to the query 
optimization problem. Our work is orthogonal to techniques for optimizing indi- 
vidual queries, as they can be incorporated into our framework in numerous ways. 
For example, individu al r equests might be first optim ized by using the techniques 
in [ Levy et al. 1996a | or [ Ambite and Knoblock 2000 1, then our techniques might 
be applied to the results. However, our focus in this paper is on the simultaneous 
optimization of a set of requests. 

Finally, the problem of choosing appropriate materialized views to ans wer queries 
is also related to our work and there exi st several papers in this area | Qian 1996 ; 
Levy et al. 1995; Chaudhuri et al. 1995] . [Levy et al. 1995 1 describes algorithms 



to determine the portions of a query that can be expressed using the definitions of 
materialized views. [Chaudhuri et al. 1995 1 identifies portions of a query that can 
be answered using materialized views, and determine if it is efficient to answer the 
query using the view. The focus of such techniques is to efficiently compute the 
answers to a single query, whereas our focus is to optimize the overall cost of a set 
of requests submitted to a heavily loaded agent. 



8. CONCLUSION 

There is now an incredible increase in the amount of research being conducted 
on software agents. Software agents now provide a host of web based services, 
ranging from creating personalized newspapers for people, to building multimedia 
presentations. In addition, agents for corporate web sites often try to personalize 
the web site for a given user by tracking histories of that user's interest. Agents 
are also being increasingly used in the aerospace and defense industries. 

When an agent gets lots of requests within a short time frame, the standard 
mechanism that most agent frameworks use is to queue the requests in accordance 
with some queueing policy (e.g. LIFO, FIFO, priority queue) and then service the 
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requests one after the other. This often leads to long wait times for requests that 
occur later on in the queue. In this paper, we have shown how to improve the 
performance of an agent by merging a set of requests and servicing the requests 
simultaneously. We proposed a generic and customizable framework for this pur- 
pose. Our solution applies to agents that are built on top of legacy code, which is 
certainly a practical assumption, as the success of the agent endeavor rests on the 
ability to build on top of existing data and software sources. Our solution consists 
of two parts: 

(1) identifying "commonalities" among a set of code call conditions and 

(2) computing a single global execution plan that simultaneously optimizes the 
total expected cost of this set of code call conditions. 

We first provided a formal framework within which an agent developer can specify 
what constitutes a "common subexpression" for a data source via a set of structures, 
called invariants. Invariants describe (1) code call conditions that are "equivalent" 
to other code call conditions, (2) code call conditions that are "contained" in other 
code call conditions, and (3) code call conditions that overlap with other code call 
conditions. Moreover, such invariants may imply other invariants. We developed 
provably sound and complete algorithms to take the initial set of invariants input 
by the developer and compute all implied invariants. 

Second, we provided an architecture to merge multiple requests in agents. We 
provided algorithms to identify equivalent, implied and overlapped code call con- 
ditions in any set C. We then proposed two heuristic based algorithms, BFMerge 
and DFMerge, that take as input, the set of code call conditions, and produce as 
output, a single execution plan. The merging decisions are based on costs, hence 
the resulting global plan is guaranteed to have a reduced cost. 

We have experimentally shown that our algorithms achieve significant savings. 
We have compared our merging algorithms with Sellis' A*-based algorithm (which 
applied to merging multiple requests in the relational database case only) and 
demonstrated that our algorithms almost always outperform theirs. We have shown 
that our merging algorithms (1) can handle more than twice as many simultaneous 
code call conditions as the A* algorithm and (2) run 100 to 6300 times faster than 
the A* algorithm and (3) produce execution plans the cost of which is at most 10% 
more than the plans generated by the A* algorithm. 

We conclude with a brief remark on an important piece of future work. Eiter 
et. al. [Eiter et al. 200C ] have developed a class of agents called regular agents. In 



their framework, the semantics of an agent is given by computing certain kinds of 
semantic constructs called "status sets." When an agent experiences a state (which 
may occur, for example, when it receives a message), the agent computes a new 



"status set" having some properties decribed in [ Eiter et al. 1999 1. This "status 
set" specifics what the agent is supposed to do in order to respond to the state 
change. Subrahmanian et al. 200C[ | shows that this framework is rich enough not 



only to deal with reactive agent behavior [ Kowalski and Sadri 1999[ | , but also the so- 



called autonomous agent behavior of the type described by Shoham |Shoham 1993 



Shoham 1999|. Eiter et. al. | Eiter et al. 2000 |'s regular agent framework reduce the 



problem of computing "status sets" of regular agents to that of evaluating a set of 
code call conditions. The beauty of their result is that the syntactic restrictions on 
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regular agents makes it possible, to associate with each agent, prior to deployment 
of the agent, a set of code call conditions. Whenever the agent needs to find a 
new status set in response to a state change, it recomputes a new status set by 
evaluating this set of code call conditions. Hence, all the techniques described in 
this paper may be used to optimize, once and for all, this set of code call conditions, 
so that once the agent is deployed, this optimized set of code call conditions is used 
by the agent for "status set" computations. We are pursuing this research avenue. 
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APPENDIX 

A. PROOFS OF THEOREMS 



Proof of Theorem 2.10 



Suppose tt is a witness to the safety of x- There are two cases: 
Case 1: Let Xir(i) be an atomic code caU condition of the form in(X^(i), cc7r(i)), 
then by the definition of safety, roof(cc^(i)) C where RVTr{i) = 

{root{Y) I 3j < i s.t. Y occurs in Xiv{j)}i ^-^d either X^^i) is a root variable 
or root(X^(i)) e RV.^{i). Then, there exist XnUi), X^rOa)' • ■ • > XnU^), 3k < 
i , such that roo<(X^(jj^)) C i?V^(i), and root(X^(jj^)) C rooi(cc^(i)). But, 
then x-Rii) is dependent on each of the X^Ui)' Xnih), X^rOO' < i by 
definition. Hence, there exist edges 

)> • • • {X^{]k)lX-^{i))- 

Therefore, XttOi), Xit(32)^ • ■ • : X-n(jk)^ :h < i precede x^(,), hence tt is also a 
topological sort of the cccg of x- 
Case 2: If x-K{i) is an equality/inequality of the form si op S2, then at least one 
of si, S2 is a constant or a variable S such that roof(S) G RV-,^(iy Suppose 
at least one of si, S2 is a variable. Then, there exists a Xn(j)i j < ij such 
that TOoi(S) G rooi(X^Q-)), as root{X^(^j^) C But, then x-n{i) is 

dependent on XttQ) by definition, and there exists an edge {x-K{i)-,X-K{i)) iii 
the cceg of Hence, X-K(j) precedes X-K(i) in the topological sort of the cceg. 
If both Si and S2 are constants, then their nodes have in-degree in the 
cceg, and no code call condition needs to precede X7r(i) in the topological 
sort order, i.e., they are unrestricted. Therefore, tt is also a topological sort 
of the cceg of x- 

(<;=:): Suppose tt is a topological sort of the cccg of x- Let 

X7r(z) 1 X7r(ji) 7 X7r(j2) ' ■ ■ ■ 'X7r(jfc): Jk ^1 

be code call conditions such that there exist edges 

(X7r(ji)j X7r(i))7 (X7r(j2)7 X7r(i))j • ■ • i (X7r(jfc)j X7r(i)) 

in the cceg of x- Then, by definition each X-n(j^), m = I, . . . , k, depends on 
X7r(i)- If X7r(i) is an atomic code call condition of the form in(X7r(i), cc7r(i)), 
then rooi(X^(jj) C rooi(cc^(i)), m = 1,... ,fc. As Vjm,m = 1,... ,k,jm < i, 
root(X^(j_^)) C RVTr(i), by definition of RV^^i), hence root(cc7r(i)) C On 
the other hand, if X7r(i) is an equality/inequality of the form si op S2, then either 
si is a variable and root(si) G root(X^(j_^)), where jm G {ji, • ■ • , Jfc}, or S2 is 
a variable and root{si) G root{X^(^yj), where G {ji, . . . , Jfc}, or both. But, 
rooi(X^(j_^')) C \fjm,iTi ~ 1,... ,k,jm < i- Hence, root(sx) , root{s2) G 

rooi(X^(j_^)). If both si and S2 are constants, then they are unrestricted in the 
topological sort. Therefore, tt is also a witness to the safety of x- CH 



Proof of Theorem 2.20. The proof is by induction on the structure of con- 
dition lists. 
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Base Cases: Base cases are when the condition hst consists of ti Op t2 where 
Op G {<, >, <, >, =} and each of ti, t2 is either a variable or a constant. We 
suppress the cases when both ti , t2 are constants: the relation either holds (in 
that case we an eliminate ti Op ^2) or it does not (in that case we can eliminate 
the whole invariant). 

Op = "<,>": We have to consider terms of the form ti < ^2 (resp. ti > ^2) and 
distinguish the following cases. For each case we define expressions \e[, ie2 
such that true ie^ K ie2 is equivalent to ti < t2 iei ie2. 

(1) t2 is a constant a: Then ti is a variable. We modify iei,ie2 by intro- 
ducing a new variable Xnew and adding the following ccc to all subex- 
pressions of iei, 162 containing ti 

in(ti, ag : subtraction{a,Xney)) & in(l, ag : geq_0{Xj^eyi))- 

We note that ti now becomes an auxiliary variable and Xnew is a base 
variable. 

Trans(ic, ie,) is defined to be the modified ie^ just described. 

(2) ti is a constant a: Then ^2 is a variable. We modify iei, ie2 by intro- 
ducing a new variable Knew and adding the following ccc to all subex- 
pressions of iei, ie2 containing t2 

in(t2, ag ; addition{a, Xnew)) & in(l, ag : geq.O{Xneii})- 

Again, t2 becomes an auxiliary variable and Xnew is a base variable. 
Trans(ic, iei) is defined to be the modified ie; just described. 

(3) Both ti, t2 are variables: We modify iei, ie2 by introducing a new vari- 
able Xnew and adding the following ccc to all subexpressions of iei, 162 
containing t2 

in(t2, ag : addition {ti,Xj,ei,)) & in(l, ag : geq-0 [Xj.ey)) ■ 

Again, t2 becomes an auxiliary variable and Xnew is a base variable. 

Trans(ic, ie^) is defined to be the modified ie^ just described. 
The case > is completely analogous: just switch ti with t2- Note that the 
above covers all possible cases, a s any variable in the condition list must 



be a base variable (see Definition 2.17) 



Op = Analogous to the previous case, just replace "ag : geq-0 (Xnev)" by 

"ag : 3eo(Xnew)" 

Op = " = ": If in ti = t2 the term ti is a variable, then we replace each occur- 
rence of ti in iei, 1^2 by t2. If ii is a constant and t2 is a variable, replace 
each occurrence of t2 in iei, 122 by ti. 

Inductive Step: As the condition list is just a conjunction of the cases mentioned 
above, we can apply our modifications of iei,ie2 one after another. Once all 
modifications have been performed, we arrive at an equivalent formula of the 
form 

true => Trans(ic, iei) ^ Trans(ic, ie2) □ 



Proof of Corollary 2.21 



Let inv : ic =^ iei ie2 be an invariant. We can assume that ic is in 
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DNF: Ci V C2 V . . . V Cm- Thus wc can write inv as follows: 

{d =^ iei SR \e2\l <i<m} 

Let inv0 be any ground instance of inv. If {S, 9) \^ inv, then either (Ci V C2 V ... V 
Cm)9 evaluates to false in state S, or {\ei)6 3? (ie2)6' is true in S. Assume that 
(Ci V C2 V ... V Cm)0 evaluates to false, then each {Ci)9 has to be false in S. Hence, 

(3,9) ^{Ci=^ iei ^ \e2)forl <i <m 

Assume (Ci V C2 V ... V Cm)9 evaluates to true in S. Then there exists at least 
one {Ci)9 that evaluates to true in state S. Let T = {{Cj)9 | 1 < j < to} be the 
set of conjunctions that arc true in S. As all other {Ci)9 ^ T evaluates to false, 
{S, 9) h {Ci =^ iei ^ ie2) forl<i<m, and {C,)9 ^ T. But {S, 9) h inv, hence 
(iei)^ 3? (ie2)6l is true in S. As a result, {S, 9) ^ (Q => iei ^ ^^2) for 1 < j < 
TO and{Cj)9 e T. 

Since, each Ci =^ iei 3? ie2 is an ordinary invariant the result follows from 
Theorem ^!20| . 



(<^) : Assume that (VC^, 1 < i < m) (5", 9) |= true =^ Tra ns (Q , iei) 3? Trons(Ci, ie2) 
and suppose (5, 6*) h (Ci V C2 V . . . V ) . Then by Theorem p^, (VC; , 1 < i < to) 
{S, 9) 1= (Ci =4> iei 3? ie2). There exists at least one {Cj)9 which evaluates to true 
in S. But then. (iei)6' 3? (ie2)6' is true in state S. Hence, {S,9) \= inv. □ 



Proof of Lemma 4.4 



Chk_Imp(iei, ie2) 

if and only if 

for all states S and all assignments 9: [iei]s,0 C [ie2]5.6i 
if and only if 

for all states S and all assignments 9: true iei6' C ie26' 
if and only if 
Chk_Taut(truc iei Q '^^2)- 



(2) follows from Theorem 2.20 and Corollary 2.2L Note that it also holds for 
invariants of the form ic => iei = ie2 because they can be written as two separate 
invariants: "ic ^ iei ^ ie2" and "ic iei 12 ^^2" ■ (3) is immediate by the very 
definition. □ 



Proof of Proposition 4.5. We show that the containment problem | UUman 



1989] in the relational model of data is an instance of the problem of checking 



implication between invariant expressions. The results follow then from Lemma 4.4 
and the fact, that the containment problem in relational databases is well known 
to be undecidable. 



To be more precise, we use the results in [Calvanese et al. 199? ], where it has been 
shown that in the relational model of data, the containment of conjunctive queries 
containing inequalities is undecidable. It remains to show that our implication 
check problem between invariant expressions can be reduced to this problem. 

Let relational : query{Q) be a code call that takes as input an arbitrary set of 
subgoals corresponding to the conjunctive query Q and returns as output the result 
of executing Q. 
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Let Qi and Q2 be arbitrary conjunctive queries which may contain inequahties. 
we define 

iei ~ relational: query{Qi), ie2 — relational: query{Q2). 
Then, clearly 

Chk_Imp(iei, ie2) = true if and only if Qi C Q2. 
Hence the implication check problem is also undecidable. □ 



Proof of Proposition |4.6| . Clearly, by Lemma it suffices to prove the 
proposition for ChkJmp. 

For an invariant expression ie, the set of all substitutions 9 such that \e9 is 
ground, is finite (because of our assumption about finiteness of the domains of 
all datatypes). Thus, our atomic code call conditions in(obj, ag :/(args)) can all 
be seen as propositional variables. Therefore, using this restriction, we can view 
our formulae as propositional formulae and a state corresponds to a propositional 
valuation. 

With this restriction, our problem is certainly in co-NP, because computing [iejs^e 
is nothing but evaluating a propositional formula (the valuation corresponds to the 
state S). Thus "[ieijs^e C [ie2]s.8 for all S and all assignments 6'" translates to 
checking whether a propositional formula is a tautology: a problem known to be in 
co-NP. 

To show completeness, we use the fact that checking whether C is a logical 
consequence of {C2, . . . , C„} (where C is an arbitrary clause and {C2, . . . , C„} an 
arbitrary consistent set of clauses) is well-known to be co-NP-complete. 

We prove our proposition by a polynomial reduction of implication between 
atomic invariant expressions to this problem. 

Let ie be an atomic invariant expression, i.e. an atomic code call condition: it 
takes as input, a set of clauses, and returns as output, all valuations that satisfy 
that set of clauses. Let ANS(ie({C})) denote the set of results of evaluating ie on 
C with respect to a state S. Then 

ANS(ie({C})) C ANS(ie({C2, . . . , C„})) if and only if {C2, . . . , C„} h C- 

Hence, checking whether an arbitrary atomic invariant expression iei implies an- 
other atomic invariant expression ie2 is co-NP hard. □ 



Lemma A.l (Translation into predicate logic). There is a translation Trans from 
simple invariants INVsimpie into predicate logic with equality such that the following 
holds 

X \= inv if and only i/ Irons (X) U Tord |= Trans (/nv), 

where Tord is the theory of strict total orders < and a < b, ( resp. a > b), is an 
abbreviation for "a < b W a = b", (resp. "a > b V a ^ b"). 

Moreover, a simple invariant "ici => iei ^ 'e'l " *s translated into a formula of 
the form 

y(/ci yx{pred^diji}{- ■■ ,x) ^ {predi^d2,h){- ■ ■ ,^))) 
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where V denotes the universal closure with respect to all remaining variables. This 
is a universally quantified formula. 

Proof. Wc translate each simple invariant to a predicate logic formula by in- 
duction on the structure of the invariant. 

Code Calls: For each n-ary code call d :/(...) we introduce a (n-l-l)-ary predicate 
pred(^d,f){- ■ ■ :')■ Note that we interpret d:/(. ..) as a set of elements. The 
additional argument is used for containment in this set. 

Atomic ccc's: We then replace each simple invariant expression 

m(X, di :/,(...)) Cin(Y, da: /a (...)) 

by the universal closure (with respect to all base variables) of the formula 
yx{pred^diji){- ■ ■ ,x)^predi^d^_j^_){... ,x)) 
Simple Ordinary Invariants: A simple ordinary invariant of the form 

ici => iei C ie'j^ 

is translated into 

V(ici ^Vx(pred(d^j^)(... , x) (pred(d2 J2) (• ■ • 1^))) 
where V denotes the universal closure with respect to all remaining variables. 
Simple Invariants: A simple invariant of the form 

(Ci VC2 V...C™) =^ iei C ie'i 
is translated into the following m statements (1 < i < m) 

V(C, -^yx{pred(duh)i--- ^ j,) (. . . ,x))) 

where V denotes the universal closure with respect to all remaining variables. 
Note that according to the definition of a simple ordinary invariant and according 



to the definition of a code call condition (in front of Example 2.1). ici and the Ci 
are conjunctions of equalities s = t and inequalities s < t, s > t, s < s > t where 
s,t are real numbers or variables. 
The statement 

T 1= inv if and only if Trans(X) U Tord H 'Jtans(inv) 

is easily proved by structural induction on simple invariants and condition lists. □ 



Proof of Lemma |4.7| . We use the translation of Lemma |A.l . 

The assumption ^ inv2 expresses that there is a state 5*0 and a substitution 9 of 
the base variables in inv2 such that So \= \C29 and there is an object a such that 
Sq ^ predi^d2,h) {- ■ ■ ,a)e andSo ^ pred(d^ j^) (. . . ,0)6*. 

As invi entails inv2, invi is not satisfied by So- Thus there is 9' such that 
5o 1= \ci9' and there is an object a' with 5o \= pred(^diji)i- ■ ■ , a')^ ^^^^ '^'o ^ 
P'^^d.(d'^j>^){. . . ,a')9. 

Now suppose {d[,f[) ^ (^27/2)- Then we simply modify the state Sq (note a state 
is just a collection of ground code call conditions) so that So \= pred(^d[j[){- ■ ■ 1 o,')9. 
We do this for all 9' that are counterexamples to the truth of invi. Because {d[, f[) ^ 
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{d'2, /2), this modification does not affect the truth of S'o |= predt^^^ j^^ (■ • • , cl)Q and 
Sq ^ pred(^ci'2J^){- ■ ■ ,a,)0- But this is a contradiction to our assumption that invi 
entails inv2. Thus we have proved: {d'l, /{) = (^2, /2)- 

Similarly, we can also modify 5*0 by changing the extension ofpred^^j^ . . , a')0 
and guarantuee that invi holds in So- So we also get a contradiction as long as 
{di,fi) 7^ (d2,/2}- Therefore wc have proved that = (d2,/2}- 

Our second claim follows trivially from {d[, /() = (dj, /2)> £^nd {di, /i) = (<i2, /2)- □ 



Proof of Lemma |4.9| . Let invi : ici iei Ki ie'j and inv2 : ic2 ie2 3?2 ie2. 
Then by the computation performed by the Combine_l algorithm, the derived 
invariant has the following form 

inv : simplify(ici A ic2) =^ iei ie2, 

where 3? is determined by Table |l]. If simplify(ici A ic2) = false wc are done. In 
this case, there is no state S satisfying a ground instance of ici A ic2. 

Wc assume that wc are given a state 5* of the agent that satisfies invi, inv2 and X. 
Let inviG and inv28 be any ground instances of invi and inv2. Then, either ici(0) 
evaluates to false, or iei(0) ie']^(8) is true in S. Similarly, either ic2(9) is false 
or ie2(0) 5^2 162(6) is true in S. 

If either ici(8) or ic2(8) evaluates to false, then (ici A ic2)(0) also evaluates to 
false, and inv is also satisfied. Let's assume both ici(8) and ic2(0) evaluate to true. 
Then so does (ici A ic2)(6), and both iei(6) Ki '\e[{Q) and ie2(6) 3?2 162(0) are 
true in S, as S satisfies both invi and inv2. If 5R = " = ", then both 5Ri = " = " and 
^2 = " = ", ie'i — » ie2 and ie2 \e[ (in all states satisfying X and invi, inv2). Then, 
we have iei = ie'i = ie2 = ie'j, hence iei(O) = ie2(0) is true in S, and O satisfies 
inv. If 5R = "C", then ie'j ie2 (in all states satisfying X and invi, inv2), and we 
have iei 5fti ie'^, ie'^ C ie2, ie2 ^2 ie2, and iei(6) C ie2(6). As inv is satisfied by 
any S that also satisfies both invi, inv2 and X, wc have {invi, inv2} L)X \= inv. □ 



Proof of Lemma 4.11. Let invi : ici => iei ^1 ie'i and inv2 : ic2 =^ ie2 3?2 162- 
Then, either Combine_3 returns NIL or the derived invariant has the following 
form 

inv : simplify(ici V ic2) =^ iei ie'i. 

In the latter case, iei = ie2, ^1 = 5^2 and ie'j^ = ie'j as implied by the Combine_3 
algorithm. 

We assume that we are given a state S of the agent that satisfies both invi and 
inv2. Let inviO and inv20 be any ground instances of invi and inv2. Then, either 
ici(O) evaluates to false, or iei (8) ^1 ie^©) is true in S. Similarly, either ic2(8) 
is false or iei(O) 5Ri iei(0) is true in 5*. We have four possible cases. 

Case 1: Both ici(8) and ic2(8) evaluate to false. Then (ici V ic2) also evaluates 
to false, and inv is also satisfied. 

Case 2: ici(8) evaluates to false and ic2(8) evaluates to true. Since S |= inv2, 
iei (8) 3^1 iei(0) true in S. Then S also satisfies inv. 

Case 3: ici(8) evaluates to true and ic2(8) evaluates to false. In this case, 
iei(6) 3?i ie'^(8) is true in 5, since S satisfies invi. Hence S also satisfies inv. 
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Case 4: Both ici(8) and ic2(6) evaluate to true. Again, since S satisfies both 
invi and inv2, iei(6) 5Ri ie'i(8) is true in S and inv is also satisfied. □ 



Proof of Proposition 4.14. Suppose Xi C X2 and inv G C-i{Xi). We need 



to show that inv G Cx{X2)- By definition of Cj, there are five possible cases: 
Case 1: inv G X. hence inv G Cx{X2) by definition of Ci. 

Case 2: inv G Xi. As Xi C X2, inv G A2. Hence, inv G Ci(X2) by definition 
of Ci. 

Case 3: inv = Combine_l(invi, inv2,I) where invi,inv2 G lU Xi. But then 
invi, inv2 G lU X2 as Xi C X2. Hence, inv G Cx{X2)- 

Case 4: inv = Combine_2(invi, inv2) where invi,inv2 G 2 U Xi. But then 
invi, inv2 G lU X2 as Xi C X2. Hence, inv G Ci(A"2). 

Case 5: inv = Combine_3(invi, inv2) where invi,inv2 G 2 U Xi. But then 
invi, inv2 G lU X2 as Xi C X2. Hence, inv G Cx{X2). □ 



Proof of Lemma [iTsl . Let inv g Cj(Ci t'^)- We need to show inv G Cj t'^- 



By the definition of Cj, there are three possible cases: 

Case 1: inv G X, then inv G Ci by the definition of Cj. 
Case 2: inv G Cj t^i which is trivial. 

Case 3: inv = Combine_l(invi, inv2, J) (or inv = Combine_2(invi, inv2) or 
inv = Combine_3(invi, inv2)) such that invi, inv2 G ZU(Ci f^). There exists a 
smallest integer ki (i=l,2) such that invj G XU (Ci f'^'). Let k := max{ki, k2)- 
Then, invi, inv2 G X U (Cj t*^)- By definition of Cj and as X C Ci ]^ , inv G 
Ci t(^+i). Hence, inv G Cx ]^ ■ □ 



Proof of Lemma 4.16| . Suppose inv G Cj t'^- Then, there exists a smallest 



integer fc, such that inv G Ci t'''. The proof is by induction on k. Let the inductive 
hypothesis be defined as Vfc' : 1 < fc' < fc, if inv G Ci j''', then 2 \= inv. 

Base Step: k = 1, inv G Cj t^i then there are four possible cases: (1) inv G 
2, hence 2 |= inv, (2) inv ~ Combine _1 (i nvi, inv2, J) where invi,inv2 G 2. 



As invi,inv2 G X, X ^ invi,inv2. Then, by Lemma 4.9, {invi,inv2} \= inv 



Therefore, 2 ^ inv. (3) inv = Combine_2 (wi, inv 2) wh ere invi,inv2 G 2. 



Since invi,inv2 G X, X ^ invi,inv2. Then, by Lemma 4.10, {invi,inv2} ^ inv 



Therefore, 2 \= inv. (4) inv = Combine_3 (wi, in v2) w here invi,inv2 G 2. 



As invi,inv2 G X, X |= invi,inv2. Then, by Lemma 4.11 , {invi,inv2} ^ inv. 
Therefore, X |= inv. 

Inductive Step: fc > 1. Let inv G Cj t'^- Then, there exist invi,inv2 G Ci t*- 
fc — 1), such that inv is derived by one of Combine_l, Combine_2 or Com- 
bine_3 operators. That is, either inv = Combine_l(invi, inv2,X), or inv = 
Combine_2(invi, inv2), or inv = Combine_3(invi, inv2). Because this is the 
only possibility, as inv ^ Cj < fc, by definition of fc. By the inductive 



hypothesis X |= invi and X |= inv2. By Lemma 4.9, {invi,inv2} ^ inv. Hence, 
X \= inv. □ 
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Lemma A. 2. We consider predicate logic with equality and a binary predicate 
symbol < . The language also contains arbitrary constants and parameters from the 
reals. 

We consider a special class of formulae, namely universally quantified formulae of 
the form ic Piit) ^ PjitD^ where t,t!_ are tuples consisting of variables, constants 
and parameters, Pi are predicate symbols and ic is an invariant condition involving 
equality, <, variables, constants and parameters. We call this class inv-formulae. 
Let T be a set of inv-formulae. 

The proof system consisting of (RO): -^^^ , where 9 is any substitution for 
the variables in the tupel x and (j) is an inv- formula, and the two inference rules 
(Rl) and (R2) below is complete for the class of inv-formulae: For each formula 
ic Pi{t) —>■ Pj{tD which follows from T , there is an instance of a derived formula 
which is identical to it. And each derived formula also follows from T . 

ici P^{t,) ^ Pi{t[) 

^„ , . „ r,f /Tr\ where is such that 

(Rl) 'f2 P2{t_2) ^ P2{tj2) p'(t')e ^ p (t )e 

simplify{{ici A ic2)e) ^ Pi{ti)0 ^ P^{t'2)e - 



ici Pi(ii) 

(R2) iC2 P2it_2) 

simplify{{ici y ic2)9'^) Pi{ti)9^ 



where Pi{tj_)9^P2{t_2)9, 



The simplify routine simplifies invariant conditions ( containing the binary symbol 
<) wrt. the theory of real numbers in the signature <, —, and arbitrary constants 
and parameters in the reals. 

Proof. The correctness of the system is obvious, as aU rules have this property. 
The completeness follows by adapting the classical completeness proof of first- 
order logic and taking into account the special form of the inv-formulae. Let 

<p: ic* 

be a formula that follows from T . Then the set 

TU {3(ic* AP*(r)A-P*'(0)} 
is unsatisfiable (because T \= Lp). Therefore it suffices to show the following claim: 

Given a set T U {(ys} of inv-formulae, whenever T \/ Lp, then T U {"ifys} 
is satisfiable. 

Because then the assumption T \/ ip' leads to a contradiction. Therefore, taking 
into account (RO), we can conclude that at least an inv-formula of the form 

\c^P{t)^P'{l), 

with (fi = (p'9 must be derivable. 

The claim can be shown by establishing that each consistent set T\j{^(p} contain- 
ing inv-formulae and their negations, can be extended to a maximally consistent set 
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which contains witnesses^ For such sets, the following holds: (1) 0tu{- 



>>Tu{-.ip}^ 



and 



(3) 



implies jip £ 0tu{^v}; (2) for all 7: 7 e (^tu{^v} or -17 ? 
3x7 G </'tu{^i,3} implies that there is a term t with 7[j] S (/)tu{^v}- These proper 
ties induce in a natural way an interpretation which is a model of 0tu{-'i^}- CH 

Proof of Theorem 4.17| . The proof is by reducing the statement into pred- 



icate logic using Lemma A.l. We are then in a situation to apply Lemma A. 2 



Note that the inference rules of Lemma A. 2 act on inv- formulae exactly as Com- 
binel and Combines on invariants. The refor e there is a bijection between proofs 
in the proof system described in Lemma A. 2 and derivations of invariants using 
Combinel and Combines. □ 



Proof of Corollary 4.18. We are reducing the statement to Theorem 4.17 



We transform each invariant with 3? = " = ", into two separate invariants with 

If inv is of the form ic =^ iei C ie2, we are done, because 

(1) the set of transformed invariants is equivalent to the original ones, and 

(2) although deriving invariants with 3? = " = " is possible (such invariants are 
contained in the set Taut and new ones will be generated by Combine_l^ and 
by Combine_S), for all such invariants we have also both their C counterparts 
(this can be easily shown by induction). 

Let's suppose therefore that inv has the form ic iei = ie2. We know th at bo th 
ic =^> iei Q 162 and ic iei 3 ie2 arc entailed by 2 and we apply Theorem 4.17 to 



these cases. We can assume wlog that none of these two invariants is a tautology 
(otherwise we are done). 



Thus there are inv' (for C) and inv" (for D). We apply Lemma 4.7 and get that 
inv' (resp. inv") has the form ic' => iei Q ie2 (resp. ic" ^ iei 3 '^2)- By symmetry 
ic' is equivalent (in fact, by using a deterministic strategy it can be made identical) 
to ic". Thus by our Combine_2, there is also a derived invariant of the form 



IC 



lei = 162, 



and this derived invariant clearly entails ic => iei = 162 (because inv' entails ic 
iei ^ ie2 and inv" entails ic ^ iei 3 162). □ 



Proof of Corollary 4.19. 



(1): The proof is by induction on the iteration of the while loop in the Compute- 
Derived-Invariants algorithm. Let the inductive hypothesis be Vi > if inv 
is inserted into X in iteration i, then I |= inv. 

Base Step: For i ~ 0, inv G X, inv inv, hence T |= inv. 

Inductive Step: Let inv be inserted into X in iteration i > 0, and inv = 

Combine_l(invi, inv2,I), where invi and inv2 are inserted into X at step 

^This is analogous to the classical Hcnkin proof of the completeness of first-order logic. In our case 
the theory T in question contains only finitely many free variables which simplifies the original 
proof. 

^see the first line in Table hi 
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(i — 1) or earlier. Then, by the inductive hypothesis. X |= invi and T \= inv2. 
By Lemma 4.9, {invi, inv2} |= inv, hence T \^ inv. 
(2): First note that the Compute-Derived-Invari ants algorithm co mpute s and 
returns Cx • The result follows from Theorem 4.17 and Corollary 4. IS . □ 



B. AXIOMATIC INFERENCE SYSTEM 





Inference Rules 


Equivalence Rules 






A C A 


AV} Ar\A = A 


{Av\B)<ZA 


Ayj% = A and A n = 


A C (A U B) 


(AuB)uc = Au(BuC) and (4 n B) n c = A n (B n c) 


((AuB)n^B) C A 


AuB = BUyl and^nB^Bn^ 


A C ((ylnB) U^B) 


A u (B n C) = (A u B) n u C) 


if A C B and B C C 


A n (B u C) = (A n B) u (A n C) 


then A C C 




if A C B and C C B) 


^(^UB) = ^^n^B and ^(^nB) = -.AU-.B 


then (A n C) C (B n D) 


A U n B) = A and y4 n (A U B) = A 


if A C B and C C B 




then (A n C) C (B U D) 



C. INVARIANTS FOR THE SPATIAL AND RELATIONAL DOMAINS 



T =T' a L =L' a R =R' =^ 






in(Y, spatial: t;eriica/(R, L, R)) = in(Y, spatial: verUcal{T' 


L' 


R')) 


T =T' A L =L' A R <R' => 






in(Y, spatial : t;eriicaZ(R, L, R)) C in(Y, spatial : i;ertica/(T' 


L' 


R')) 


T =T' a R =R' A L>L' ^> 






in(Y, spatial : ?;eri«caZ(R, L, R)) C in(Y, spatial : 116^120/(1', 


L' 


R')) 


T =T' A R<R' A L>L' =4> 






in(Y, spatial : verUcal{T, L, R)) C in(Y, spatial : verticalij' 


L' 


R')) 


T =T' a B =B' a U =U' => 






in(Y, spatial: vertical {T,'Q,'\3)) — in(Y, spatial: verticalij' 


B' 


U')) 


T =T' a B =B' a U <U' =4> 






in(Y, spatial: verticalij' ^B' ,\}')) C in(Y, spatial: vertical {T' , 


B',U')) 


T =T' a U =U' a B >B' =^ 






in(Y, spatial : t;eriicaZ(T, B, U)) C in(Y, spatial : 't;eriica/(T' 


B' 


U')) 


T =T' a U <U' a B >B' => 






in(Y, spatial : ?;eriicaZ(T, B, U)) C in(Y, spatial : i'eriica/(T' 


B' 


U')) 


T =T' AX =X' A Y =Y' A Rad =Rad' =4^ 






in(Z, spatial : rangeij, X, Y, Rad)) = in(W, spatial : rangeij 


,x 


',Y',Rad')) 


T =T' ax =X' a Y =Y' a Rad < Rad' =4> 






in(Z, spatial : range{7, X, Y, Rad)) C in(W, spatial : rangeij 


,x 


',Y',Rad')) 



Table 7. Invariants for the spatial domain (1) 
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T =T' A R <R' A L <L' A L' <R => 
in(Y, spatial : verUcal{T, L, R)) U in(Y, spatial : verUcal{T' ,L' ,R')) 

in(Y, spatial : vertical(l, L, R')) 
T =t' a R >r' a L >L' a L <R' ^ 

in(Y, spatial : vertical{R, L, R)) U in(Y, spatial : vertical {T' ,L' ,R')) 

in(Y, spatial : vertical {T, L', R)) 
T =T' a U<U' a B<B' a B'<U ^ 

in(Y, spatial: honzontal{T,'B,'U)) U in(Y, spatial: horizontal {T' ,B' ,1!')) 

in(Y, spatial : horizontal(l' ,B, U')) 
T =T' a U >U' a B >B' a B <U' => 

in(Y, spatial: horizontal {T,B,\J)) U in(Y, spatial: horizontal{T' ,B' ,1!')) 
in(Y, sp atial : horizontal (T' , B' , U) ) 
Tabic 8. Invariants for the spatial domain (2) 
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Rel =Rel' A Attr =Attr' A Op =0p' A V =V' ^ 
in(X, rel : select (Rel, Attr, Op, V)) = in(Y, rel : se/ect (Rel', Attr', Op', V')) 


Rel =Rel' A Attr =Attr' A Dp =Dp' ="<" A V<V' => 
in(X, rel : select{Rel, Attr, Dp, V)) C in(Y, rel : se/ect (Rel', Attr', Op', V')) 


Rel =Rel' A Attr =Attr' A Dp =Dp' A V>V' 
in(X, rel : select{Rel, Attr, Dp, V)) C in(Y, rel : select {Rel' , Attr', Op', V')) 


Rel =Rel' A Attr =Attr' A VI =Vl' A V2 =V2' 
in(X, rel: rngseZeci (Rel, Attr, VI, V2)) = in(Y, rel: rngsekct (Rel', Attr', Vl' 


,V2')) 


Rel =Rel' A Attr =Attr' A Vl>Vl' A V2 <V2' 
in(X, rel : rngselect{Rel, Attr, VI, V2)) C in(Y, rel : rngse/erf (Rel', Attr', Vl' 


,V2')) 


Rel =Rel' A Attr =Attr' A Vl<Vl' A V2 <V2' A Vl' < V2 =4^ 




in(X, rel : rngselect{Rel, Attr, VI, V2)) U in(Y, rel : rngse/erf (Rel', Attr', Vl' 


V2')) 


in(Z, rel : rngselect{Rel, Attr, VI, V2')) 




Rel =Rel' A Attr =Attr' A Vl>Vl' A V2 >V2' A V1<V2' ^> 




in(X, rel : rngselect{Rel, Attr, VI, V2)) U in(Y, rel : rngse/erf (Rel', Attr', Vl' 


V2')) 


in(Z, rel : rng select {Rel, Attr, Vl', V2)) 





Table 9. Invariants for the relational domain (1) 
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Rel =Rel' A Attrl =Attrl' A Attr2 =Attr2' A Vl' =V1 A 
V2 =V2' A V3' =V3 A V4 =V4' ^> 

in(X, rel : mgselect{Rel, Attrl, VI, V2)) n in(Y, rel : rngselect{B.el, Attr2, V3, V4)) 

in(Z, rel : rngsekct(Rel', Attrl', Vl', V2')) n in(W, rel : rngselect{Rel, Attr2, V3', V4')) 
Rel =Rel' A Attrl =Attrl' A Attr2 =Attr2' A Vl' <V1 
A V2 <V2' A V3' <V3 A V4 <V4' =^ 

in(X, rel : mgselect{Rel, Attrl, VI, V2)) n in(Y, rel : rngselect{Rel, Attr2, V3, V4)) 

C 

in(Z, rel : rngselect{Rel' , kttrl' , Vl', V2')) n in(W, rel : rngselect{Rel, Attr2, V3', V4')) 
Rel =Rel' A Attrl =Attrl' A Attr2 =Attr2' A Vl' <V1 A 
V2' <V2 A V3 <V3' A V4 <V4' A VI <V2' A V3' <V4 

(in(X, rel : mgselect(Rel, Attrl, VI, V2)) n in(X, rel : rngselect{Rel, Attr2, V3, V4))) U 
(in(X, rel: rn5sekrf(Rel', Attrl', Vl',V2')) n in(X, rel : rngsekrf (Rel, Attr2, V3', V4'))) 

C 

in(X, rel : T-ngselect{Rel, Attrl, Vl', V2)) n in(X, rel : rngselect{Rel, Attr2, V3, V4')) 
Rel =Rel' A Attrl =Attrl' A Attr2 =Attr2' A VI <Vl' 
A V2 <V2' A V3 <V3' A V4 <V4' A Vl' <V2 A V3' <V4 ^> 

(in(X, rel : rn5isekrf(Attrl, VI, V2)) n in(Y, rel : rngsekrf(Rel, Attr2, V3, V4))) U 
(in(Z, rel : rn^sekrf (Rel', Attrl', Vl', V2')) n in(W, rel : rngselect{Rel, Attr2, V3', V4'))) 

C 

in(X', rel : rngselect{Rel, Attrl, VI, V2')) n in(Y', rel : rngselect{Rel, Attr2, V3, V4')) 
Rel =Rel' A Attrl =Attrl' A Attr2 =Attr2' A Vl<Vl' A 
V2 <V2' A V3' <V3 A V4' <V4 A Vl' <V2 A V3 <V4' => 

(in(X, rel : mgselect{Rel, Attrl, VI, V2)) n in(Y, rel : rngselect{Rel, Attr2, V3, V4))) U 
(in(Z, rel : rngselect{Rel' , Attrl', Vl', V2')) n in(W, rel : rngselect {Rel, Attr2, V3', V4'))) 

C 

in(X', rel : mgselect{Rel, Attrl, VI, V2')) n in(Y', rel : rngselect{Rel, Attr2, V3', V4)) 
Rel =Rel' A Attrl =Attrl' A Attr2 =Attr2' A Vl' <V1 A 
V2' <V2 A V3' <V3 A V4' <V4 A VI <V2' A V3 <V4' 

(in(X, rel : rngselect{Rel, Attrl, VI, V2)) n in(Y, rel : rngselect{Rel, Attr2, V3, V4))) U 
(in(Z, rel: rn5sekrf(Rel', Attrl', Vl',V2')) n in(W, rel : rngsekrf (Rel, Attr2, V3', V4'))) 

C 

in(X', rel: r'ngse/eci(Rel, Attrl, Vl', V2)) n in(Y', rel: rngsekci (Rel, Attr2, V3', V4)) 



Table 10. Invariants for the relational domain (2) 



